Quick string search like startsWith () not equals ()

I have an ordered list (dictionary - 100 thousand words) and many words to visit this list often. Thus, performance is a problem. I know that HashSet.contains (theWord) or Collections.binarySearch (sortedList, theWord) is very fast. But I'm not really looking for all the words.

I want to say that searching for "se" and getting all the words starts with "se". So, is there a ready-to-use solution in Java or in any libraries?

Best example: in a sorted list, a quick fix for the next operation

List.subList (String beginIndex, String endIndex) // returns the interval

myWordList.subList ("ab", "bc");

Note. This is a very similar question, but the accepted answer is not satisfactory. Method Override Contains HashSet

+3
source share
4 answers

What you are looking for here is a data structure called trie:

http://en.wikipedia.org/wiki/Trie

It stores the lines in a tree indexed by the prefix, where the first level of the tree contains the first character of the line, the second level contains the second character, etc. As a result, it allows you to retrieve subsets of very large sets of strings by prefix very quickly.

+9
source

Trie . Google Collections/Guava Trie implementation.

+2

: . , , ( ).

List.subList(String beginIndex, String endIndex)// , ? ?

+2

. , .

, ( "se" ), . , , , -1 .

, + "z" ( "sez" ), , , "sez", + 1 .

, , , , .

:

  • "b" , "az"
  • "z" - char

I have this algorithm implemented in the JavaScript data processing library (jOrder.net).

+1
source

Source: https://habr.com/ru/post/1756986/


All Articles