An algorithm for finding the length of the longest suffix between a string and a string prefix

Input:

There is a long string S , and we have an array of integers A , which denotes the prefixes String S , as A[i] , denotes the prefix S[0..A[i]]

Output:

Returns an Output[] array of the same size as A , where Output[i] is the length of the longest matching suffix S[0..A[i]] and S

Input Example:

 S = "ababa" A[]=[0, 1, 2, 3, 4] 

Output result:

Output[]=[1,0,3,0,5]

The most naive algorithm I have for each A[i] simply corresponds to the number of characters between S[0..A[i]] and S from the end of both lines. But this algorithm is O(n^2) , where n is the length of the original string S.

Question:
Is there a better algorithm that pre-processes the string S, and then can quickly return a long suffix for the entire input array?

+5
source share
2 answers

This is just the Z-function of the modified string. A slight difference is that the first element of the Z-function is chosen equal to the length S. There is an algorithm for computing the Z-function of a string in O (n)

And the algorithm for this task is as follows:

  • S ': = reverse S
  • Z: = Z-function of S '
  • for each i, Output [i]: = Z [Len (S) - A [i] - 1]

For instance:

 S = "baabaaa" A[] = [0,1,2,3,4,5,6] Output[] should be [0,1,2,0,1,2,7] S' = "aaabaab" Z = Z-function(S') = [7,2,1,0,2,1,0] (with the first element chosen to be Len(S)) 
+4
source

The algorithm / data structure you are looking for is called the suffix tree , has worse complexity O(n log n)

In computer science, the suffix tree (also called the PAT tree or in its earlier form, the position tree) is a compressed trie containing all suffixes of a given text as their keys and positions in the text as their meanings. Suffix trees allow you to quickly implement many important string operations. ( wiki )

Here you can find slides that detail functionality and implementation.

-1
source

Source: https://habr.com/ru/post/1233123/


All Articles