The suffix tree can be thought of as a data structure built on top of a trie vertex, where instead of simply adding a line to a trie, you also add all the possible suffixes of that line. For example, if you want to index a banana string in a suffix tree, you should build a trie with the following lines:
banana anana nana ana na a
After that, you can search for any n-gram and see if it is present in your indexed row. In other words, an n-gram search is a prefix search for all possible suffixes of your string.
This is the easiest and slowest way to create a suffix tree. It turns out that there are many more favorable options in this data structure that improve one or both of the spaces and build time. I do not know enough about this domain to give an overview, but you can start by exploring arrays of suffixes or extended data structures in this class (lectures 16 and 18).
This answer also does a wonderful job explaining a variant of this data structure.
Ze Blob Dec 15 '12 at 16:52 2012-12-15 16:52
source share