I am trying to find all substrings in a given string. For a random string such as rymis , the subsequences would be [i, is, m, mi, mis, r, ry, rym, rymi, rymis, s, y, ym, ymi, ymis] . From Wikipedia, a string of length n will have substrings n * (n + 1) / 2 .
What can be found by executing the following code snippet:
final Set<String> substring_set = new TreeSet<String>(); final String text = "rymis"; for(int iter = 0; iter < text.length(); iter++) { for(int ator = 1; ator <= text.length() - iter; ator++) { substring_set.add(text.substring(iter, iter + ator)); } }
Which works for short string lengths, but obviously slows down for long lengths since the algorithm is close to O(n^2) .
Also reading on the suffix trees, which can do inserts in O(n) , and noticed that the same subsequences can be obtained by reinserting the substrings, deleting 1 character to the right until the string is empty. Which should be about O(1 + β¦ + (n-1) + n) , which is the summation of n β n(n+1)/2 β (n^2 + n)/ 2 , which again is close to O(n^2) . Although it seems that there are some suffix trees that can do inserts in log2(n) time, which would be better than O(n log2(n)) .
Before I delve into Suffix Villages, is this the right route to take, is there another algorithm that would be more efficient for this, or would O(n^2) be as good as it gets?