The usual way to quickly substring is to create a data structure that contains all the suffixes of all the strings you want to find. Depending on the organization, this may be called a "suffix" or a "suffix array". For example, if you have 1000 lines and each of them has a length of 50 characters, you have 1,000 x 50 non-trivial suffixes, i.e. Your suffix array will have 50,000 entries.
Then, to perform the mapping, you perform a binary search (if an array) or a tree search (if a tree) to find all these suffixes in the data structure, the beginning of which corresponds to the line written in the search window. Since this is the beginning of the suffix that you are matching, you can use standard search procedures (binary search, downstream) to quickly get the result. Each suffix is associated with the lines in which it appears.
Example: you have two lines: "CAT" and "DOT". Your suffix array looks like this (pay attention to lexicography = alphabetical ordering):
#1 AT --> CAT #2 CAT --> CAT #3 DOT --> DOT #4 OT --> DOT #5 T --> DOT, CAT
Note that there are six suffixes, but two of them are the same (the last “T” in “CAT” and “DOT”), and both are represented by the same entry (# 5).
Now that the user is entering a search, for example. "OT" (which must match "DOT"), you can do a simple search by lexicographic ordering during registration, because now you are looking for an initial match in an array of suffixes.
This is the basic principle for quick text search when the search pattern does not contain wildcards.
Antti Huima Feb 12 '09 at 17:39 2009-02-12 17:39
source share