Generating String Substrings in Java

I am trying to find all substrings in a given string. For a random string such as rymis , the subsequences would be [i, is, m, mi, mis, r, ry, rym, rymi, rymis, s, y, ym, ymi, ymis] . From Wikipedia, a string of length n will have substrings n * (n + 1) / 2 .

What can be found by executing the following code snippet:

  final Set<String> substring_set = new TreeSet<String>(); final String text = "rymis"; for(int iter = 0; iter < text.length(); iter++) { for(int ator = 1; ator <= text.length() - iter; ator++) { substring_set.add(text.substring(iter, iter + ator)); } } 

Which works for short string lengths, but obviously slows down for long lengths since the algorithm is close to O(n^2) .

Also reading on the suffix trees, which can do inserts in O(n) , and noticed that the same subsequences can be obtained by reinserting the substrings, deleting 1 character to the right until the string is empty. Which should be about O(1 + … + (n-1) + n) , which is the summation of n β†’ n(n+1)/2 β†’ (n^2 + n)/ 2 , which again is close to O(n^2) . Although it seems that there are some suffix trees that can do inserts in log2(n) time, which would be better than O(n log2(n)) .

Before I delve into Suffix Villages, is this the right route to take, is there another algorithm that would be more efficient for this, or would O(n^2) be as good as it gets?

+4
source share
3 answers

This is the inverted way of your example, but still o (n ^ 2).

 string s = "rymis"; ArrayList<string> al = new ArrayList<string>(); for(int i = 1; i < s.length(); i++){//collect substrings of length i for(int k = 0; k < s.length(); k++){//start index for sbstr len i if(i + k > s.length())break;//if the sbstr len i runs over end of s move on al.add(s.substring(k, k + i));//add sbstr len i at index k to al } } 

Let me see if I can post a recursive example. I started making a couple of recursive attempts and came up with this iterative approach, using double sliding windows as a kind of improvement to the above method. I had a recursive example, but I had problems with reducing the size of the tree.

 string s = "rymis"; ArrayList<string> al = new ArrayList<string>(); for(int i = 1; i < s.length() + 1; i ++) { for(int k = 0; k < s.length(); k++) { int a = k;//left bound window 1 int b = k + i;//right bound window 1 int c = s.length() - 1 - k - i;//left bound window 2 int d = s.length() - 1 - k;//right bound window 2 al.add(s.substring(a,b));//add window 1 if(a < c)al.add(s.substring(c,d));//add window 2 } } 

There was a problem with using the arraylist function, which affects performance, so the next one will have more basic structures.

 string s = "rymis"; StringBuilder sb = new StringBuilder(); for(int i = 1; i < s.length() + 1; i ++) { for(int k = 0; k < s.length(); k++) { int a = k;//left bound window 1 int b = k + i;//right bound window 1 int c = s.length() - 1 - k - i;//left bound window 2 int d = s.length() - 1 - k;//right bound window 2 if(i > 1 && k > 0)sb.append(","); sb.append(s.substring(a,b));//add window 1 if(a < c){ sb.append(","); sb.append(s.substring(c,d));//add window 2 } } } string s = sb.toString(); String[] sArray = s.split("\\,"); 
+1
source

I am sure you cannot beat O (n ^ 2) for this, as mentioned in the comments on the question.

I was interested in different coding methods, so I did it quickly and I decided to publish it here.

The solution that I ask here is not asymptotically faster, I don’t think, but when calculating the internal and external loops it is less. There are also fewer duplicate inserts - no duplicate insert.

 String str = "rymis"; ArrayList<String> subs = new ArrayList<String>(); while (str.length() > 0) { subs.add(str); for (int i=1;i<str.length();i++) { subs.add(str.substring(i)); subs.add(str.substring(0,i)); } str = str.substring(1, Math.max(str.length()-1, 1)); } 
+1
source

I'm not sure about the exact algorithm, but you can look at Ropes:

http://en.wikipedia.org/wiki/Rope_ (computer_science)

In general, ropes are better suited when data is large and often changes.

I believe Rope is superior to String for your problem.

+1
source

Source: https://habr.com/ru/post/1397876/


All Articles