Find multiple overlapping and non-overlapping substrings in a string

Question

Find multiple overlapping and non-overlapping substrings in a string

PS: This is not a duplicate. How to find the overlap between 2 sequences and return it

[Although I ask for solutions in the above approach, if it can be applied to the next problem]

Q: Although I understood correctly, it is still not a scalable solution and is definitely not optimized (low score). Read the following description of the problem and suggest the best solution.

Question:

For simplicity, we require that the prefixes and suffixes be non-empty and shorter than the whole string S. The boundary of the string S is any string that is both a prefix and a suffix. For example, "cut" is the boundary of the string "cutletcut" , and the string "barbararhubarb" has two boundaries: "b" and "barb" .

 class Solution { public int solution(String S); }

which, given the string S , consisting of characters N , returns the length of the longest border, which contains at least three non-overlapping occurrences in the given string. If there is no such boundary in S , the function should return 0.

For instance,

if S = "barbararhubarb" function should return 1 , as described above;
if S = "ababab" function should return 2 , since "ab" and "abab" are both boundaries of S , but only "ab" has three non-overlapping occurrences;
if S = "baaab" function should return 0 , since its only border "b" occurs only twice.

Let's pretend that:

N is an integer in the range [0..1,000,000] ;
string S consists only of lowercase letters ( a−z ).

Complexity:

expected worst-case time complexity O(N) ;
the expected worst case complexity is O(N) (not counting the storage needed for the input arguments).

 def solution(S): S = S.lower() presuf = [] f = l = str() rank = [] wordlen = len(S) for i, j in enumerate(S): y = -i-1 f += S[i] l = S[y] + l if f==l and f != S: #print f,l new=S[i+1:-i-1] mindex = new.find(f) if mindex != -1: mid = f #new[mindex] #print mid else: mid = None presuf.append((f,mid,l,(i,y))) #print presuf for i,j,k,o in presuf: if o[0]<wordlen+o[-1]: #non overlapping if i==j: rank.append(len(i)) else: rank.append(0) if len(rank)==0: return 0 else: return max(rank)

My time complexity of solutions: O (N 2) or O (N 4) Help with gratitude.

+4

python python-2.7

user2290820 Jul 05 '13 at 19:47

source share

5 answers

Cong Nguyen · Answer 1 · 2013-08-02T03:00:02+0000

My solution is a combination of the Rabin-Karp and Knut-Morris-Pratt algorithms. http://codility.com/cert/view/certB6J4FV-W89WX4ZABTDRVAG6/details

Vincenzo melandri · Answer 2 · 2013-07-09T06:14:08+0000

I have a (Java) solution that runs O (N) or O (N ** 3) for a total of 90/100 in general, but I cannot figure out how to do this, although there are 2 different test files:

almost_all_same_letters aaaaa ... aa ?? aaaa ?? .... aaaaaaa 2.150 s. TIME ERROR operating time:> 2.15 s, time limit: 1.20 s.

same_letters_on_both_ends 2.120 s. TIME ERROR operating time:> 2.12 sec., Time limit: 1.24 s.

Edit: Nailed it! Now I have a solution that runs in O (N) and passes all the checks for a 100/100 result :) I did not know Codility, but it is a good tool!

Ivor prebeg · Answer 3 · 2013-07-14T16:01:08+0000

I have a solution with suffix arrays (in fact, there is an algorithm for constructing SA and LCP in linear time, or something even worse, but, of course, not quadratic).

Not sure if I can do without RMQ (O (log n) with SegmentTree), which I could not convey to my own cases and seems rather complicated, but with RMQ it can (not mention an approach with a for loop instead of RMQ, which did would be quadratic anyway).

The solution runs pretty quickly and passes my 21 test cases with various perks that I managed to create, but still failed to complete some of them. I'm not sure if this helped you or gave you an idea of how to approach the problem, but I am sure that a naive solution like @Vicenco, in some of its comments, cannot make you better than Silver.

EDIT: I managed to fix all the problems, but still slow down. I had to ensure compliance with certain conditions, but I had to complicate this task, but I'm not sure how to optimize it. Will keep you posted. Good luck

Binesh · Answer 4 · 2014-12-16T15:51:45+0000

 protected int calcBorder(String input) { if (null != input) { int mean = (input.length() / 3); while (mean >= 1) { if (input.substring(0, mean).equals( input.substring(input.length() - mean))) { String reference = input.substring(0, mean); String temp = input .substring(mean, (input.length() - mean)); int startIndex = 0; int endIndex = mean; int count = 2; while (endIndex <= temp.length()) { if (reference.equals(temp.substring(startIndex, endIndex))) { count++; if (count >= 3) { return reference.length(); } } startIndex++; endIndex++; } } mean--; } } return 0; }

mitesh joshi · Answer 5 · 2016-01-30T05:06:45+0000

Z-Algorithm would be a good solution.

Find multiple overlapping and non-overlapping substrings in a string

More articles: