Lucene fuzzy search lazy?

I would like to use Lucene's fuzzy search, which, as I understand it, is based on some kind of algorithm similar to Levenshtein. If I use a fairly high threshold (ie "New York ~ 0.9"), he will first calculate the editing distance and then see if it is less than 0.9, or he will cut the algorithm if it becomes obvious that the document does not match the exact query? I understand that this is possible using the Levenshtein algorithm.

+3
source share
1 answer

will he cut off the algorithm if it becomes obvious that the document does not match a close query?

No. The code you want to see is lines 57-59 of FuzzyTermEnum:

int dist = editDistance(text, target, textlen, targetlen);
distance = 1 - ((double)dist / (double)Math.min(textlen, targetlen));
return (distance > FUZZY_THRESHOLD);

, , , .

? , .

+2

Source: https://habr.com/ru/post/1751953/


All Articles