Find a repeating substring of length N

I need to create a Java program that finds all duplicate substrings of length n in a given string. Line input is very long and brute force takes too long.

I tried:
Currently, I find each substring separately and check for repetitions of that substring using KMP alogrithm . It takes too much time.

What is a more effective approach to this problem?

+4
source share
2 answers

1) You should look at the data structure of the suffix tree.

Suffix tree

O (N * log N)
( , O (N) )
N - / .
( ) O (M) , M - / .

, , , , , ( O).

2) ( ) :

,

, .
, - ;)

3) .

-

, ... .

+3

@peter.petrov , , :

 1. Create a suffix tree from the string, let it be `T`.
 2. Find all nodes of depth `n` in the tree, let that set of nodes be `S`. This can be done using DFS, for example.
 3. For each node `n` in `S`, do the following:
     3.1. Do a DFS, and count the number of terminals `n` leads to. Let this number be `count`
     3.2. If `count>1`, yield the substring that is related to `n` (the path from root to `n`), and `count`

, n S, , , , .

, - O(Creation + Traversal) - , , ( , 2-3 node ). "", , O(Creation), , @perer.petrov, O(|S|) O(|S|log|S|) .

+2

Source: https://habr.com/ru/post/1569866/


All Articles