The longest palindrome in a string using the suffix tree

I tried to find the longest palindrome in the string. A brute force decision takes O (n ^ 3) time. I read that it uses a linear time algorithm using suffixes. I am familiar with suffix trees, and itโ€™s more convenient for me to build them. How do you use the constructed suffix tree to find the longest palindrome.

+44
algorithm palindrome suffix-tree
Aug 12 '11 at 17:10
source share
5 answers

I believe that you need to proceed as follows:

Let y 1 y 2 ... y n be your string (where y i are letters).

Create a generalized suffix tree S f = y 1 y 2 ... y n $ and S r = y n y n - 1 ... y 1 # (cancel the letters and select different ending characters for S f ($) and S r (#)) ... where S f means "String, Forward" and S r means "String, Reverse".

For each suffix i from S f, find the smallest common ancestor with the suffix n - i + 1 in S r .

What happens from the root until this lowest common ancestor becomes a palindrome, because now the lowest common ancestor is the longest common prefix of these two suffixes. Recall that:

(1) The suffix prefix is โ€‹โ€‹a substring.

(2) A palindrome is a string identical to its inverse.

(3) Thus, the longest containing palindrome inside a string is the longest common substring of this string and its inverse.

(4) Thus, the longest containing palindrome inside the string is the longest common prefix of all pairs of suffixes between the string and its inverse. This is what we do here.

Example

Take the word banana.

S f = banana $

S r = ananab #

Below is a generalized tree of suffixes S f and S r , where the number at the end of each path is the index of the corresponding suffix. There is a small error common to all three branches of the Blue_4 parent should be on its input edge, next to n:

enter image description here

The lowest inner part of a node in a tree is the longest common substring of this string and its inverse. Looking at all the internal nodes in the tree, you will therefore find the longest palindrome.

The longest palindrome is found between green_0 and blue_1 (i.e. banana and anana) and is anana




EDIT

I just found this article that answers this question.

+29
Aug 12 '11 at 18:30
source share

A linear solution can be found along this path:

Prequisities:

(one). You need to know how to build a suffix array in O (N) or O (NlogN) time.

(2). You need to know how to find the standard LCP array, i.e. LCP between adjacent suffixes i and i-1

those. LCP [i] = LCP (suffix i in a sorted array, suffix i-1 in a sorted array) for (i> 0).

Let S be the original string, and S 'the reverse side of the original string. Let's take S = " banana " as an example. Then its return line is S '= ananab.

Step 1 : combine S + # + S ' to get String Str, where # is the alphabet that is not in the original string.

Concatenated String Str=S+#+S' Str="banana#ananab" 

Step 2: Now build an array of str suffix.

In this example, the suffix array is:

 Suffix Number Index Sorted Suffix 0 6 #ananab 1 5 a#ananab 2 11 ab 3 3 ana#ananab 4 9 anab 5 1 anana#ananab 6 7 ananab 7 12 b 8 0 banana#ananab 9 4 na#ananab 10 10 nab 11 2 nana#ananab 12 8 nanab 

Note that the suffix array is an array of integers giving the starting positions of the suffixes of the string in lexicographic order. So, the array containing the index of the starting position is an array of suffixes.

This is SuffixArray [] = {6,5,11,3,9,1,7,12,0,4,10,2,8};

Step 3 Since you were able to create an array of suffix, find the longest common prefixes between adjacent suffixes .

 LCP between #ananab a#ananab is :=0 LCP between a#ananab ab is :=1 LCP between ab ana#ananab is :=1 LCP between ana#ananab anab is :=3 LCP between anab anana#ananab is :=3 LCP between anana#ananab ananab is :=5 LCP between ananab b is :=0 LCP between b banana#ananab is :=1 LCP between banana#ananab na#ananab is :=0 LCP between na#ananab nab is :=2 LCP between nab nana#ananab is :=2 LCP between nana#ananab nanab is :=4 

Thus, the LCP array is LCP = {0,0,1,1,3,3,5,0,1,0,2,2,4}.

Where LCP [i] = The length of the longest common prefix between the suffix i and the suffix (i-1). (for i> 0)

Step 4:

Now you have built the LCP array, use the following logic.

  Let the length of the Longest Palindrome ,longestlength:=0 (Initially) Let Position:=0. for(int i=1;i<Len;++i) { //Note that Len=Length of Original String +"#"+ Reverse String if((LCP[i]>longestlength)) { //Note Actual Len=Length of original Input string . if((suffixArray[i-1]<actuallen && suffixArray[i]>actuallen)||(suffixArray[i]<actuallen && suffixArray[i-1]>actuallen)) { //print :Calculating Longest Prefixes b/w suffixArray[i-1] AND suffixArray[i] longestlength=LCP[i]; //print The Longest Prefix b/w them is .. //print The Length is :longestlength:=LCP[i]; Position=suffixArray[i]; } } } So the length of Longest Palindrome :=longestlength; and the longest palindrome is:=Str[position,position+longestlength-1]; 

Execution Example ::

  actuallen=Length of banana:=6 Len=Length of "banana#ananab" :=13. Calculating Longest Prefixes b/wa#ananab AND ab The Longest Prefix b/w them is :a The Length is :longestlength:= 1 Position:= 11 Calculating Longest Prefixes b/w ana#ananab AND anab The Longest Prefix b/w them is :ana The Length is :longestlength:= 3 Position:=9 Calculating Longest Prefixes b/w anana#ananab AND ananab The Longest Prefix b/w them is :anana The Length is :longestlength:= 5 Position:= 7 So Answer =5. And the Longest Palindrome is :=Str[7,7+5-1]=anana 

Just take a note ::

The if condition in step 4 basically refers to the fact that at each iteration (i), if I accept the suffixes s1 (i) and s2 (i-1), then "s1 should contain # and s2 should not contain #" OR " s2 must contain # and s1 must not contain # ".

  |(1:BANANA#ANANAB)|leaf tree:| | | | |(7:#ANANAB)|leaf | | |(5:NA)| | | | |(13:B)|leaf | |(3:NA)| | | |(7:#ANANAB)|leaf | | | | | |(13:B)|leaf |(2:A)| | |(7:#ANANAB)|leaf | | | |(13:B)|leaf | | | |(7:#ANANAB)|leaf | |(5:NA)| | | |(13:B)|leaf |(3:NA)| | |(7:#ANANAB)|leaf | | | |(13:B)|leaf | |(7:#ANANAB)|leaf 
+27
Jun 29 2018-12-12T00:
source share

A few years later ...

Suppose s is the source string and r is s . Suppose also that we have completely constructed the ST suffix tree using s .

Our next step is to check all r suffixes on ST . With each new suffix r we will support counting the first k characters that we have successfully mapped to the existing suffix in the tree (i.e., One of the suffix s ).

As an example, let's say that we map the suffix โ€œRATโ€ to r , and s contains some suffixes starting with โ€œRAโ€, but none of them match โ€œRATโ€. k would be 2 when we finally had to give up hope of the final โ€œTโ€ characters. We match the first two characters of the suffix r with the first two characters of the suffix s . We will call this node that we have reached n .

Now, how do we know when we found the palindrome? Checking all leaf nodes under n .

In a traditional suffix tree, the starting index of each suffix is โ€‹โ€‹stored on the node sheet of this suffix branch. In our example above, s can contain many suffixes starting with "RA", each of which begins with one of the indices present in the descendants of node sheet n .

Use these indexes.

What does it mean if we compare the characters k one of the r substrings with the characters k in ST ? Well, that just means that we found some line in the reverse order. But what does this mean if the place where the substring in r begins is equal to the matched substring in s plus k ? Yes, this means that s[i] through s[i+k] reads the same as s[i+k] through s[i] ! So, whether it is a definition, we found a palindrome of size k .

Now all you have to do is save the tab on the longest palindrome found so far and return it at the end of your function.

+3
Jan 08
source share

A simple and short explanation from Skiena - The Algorithm Design Manual

Find the longest palindrome in S [using the suffix tree]. A palindrome is a line that reads the same if the order of characters is reversed, for example, madame. To find the longest palindrome in line S, build a single suffix tree containing all the suffixes S and the reversal S, with each leaf identified by its original position. A palindrome is defined by any node in this tree that has back and forth children from the same position.

+1
Nov 01 '14 at 12:31
source share

DP Solution:

 int longestPalin(char *str) { n = strlen(str); bool table[n][n]l memset(table, 0, sizeof(table)); int start = 0; for(int i=0; i<n; ++i) table[i][i] = true; int maxlen = 1; for(int i=0; i<n-1; ++i) { if(str[i] == str[i+1]) { table[i][i] = true; start = i; maxlen = 2; } } for(int k=3; k<=n; ++k) { for(int i=0; i<n-k+1; ++i) { int j = n+k-1; if(str[i] == str[j] && table[i+1][j-1]) { table[i][j] = true; if(k > maxlen) { start = i; maxlen = k; } } } } print(str, start, start+maxlen-1); return maxlen; } 
-one
Jan 06 '14 at 19:11
source share



All Articles