As Yuval explained, overall Lucene focuses on exact matches (by normalizing terms with analyzers in terms of both index and query time).
The Lucene trunk code (a version not yet released) actually uses a suffix tree for inaccurate matches such as Regex, Wildcard, and Fuzzy.
How it works is that the Lucene dictionary dictionary is indeed a form of suffix tree. You can see this in the file formats you mentioned in several places:
Thus, if the text of the previous word was “bone” and the term “boy”, the Length prefix is two, and the suffix is “y”.
The term information index gives us "random access" by indexing this tree at specific intervals (every 128th term is the default).
Thus, a low-level tree is a suffix tree, but at a higher level, we use these properties (mainly those specified in IndexReader.terms for processing the dictionary as a determinate finite state machine (DFA):
Returns an enumeration of all members starting with this term. If the given term does not exist, then the enumeration is placed in the first member, exceeding the stated term. Enumeration is ordered using Term.compareTo (). Each member is larger than everything that precedes it in the listing.
Inaccurate queries such as Regex, Wildcard, and Fuzzy are themselves also defined as DFAs, and “matching” is just an intersection of DFAs.
source share