Fuzzy search with lucene

I implemented a fuzzy search with lucene 4.3.1, but I am not satisfied with the result. I would like to indicate a series of results that he should return. So, for example, if I want 10 results, he should return the top 10 matches, no matter how bad they are. Most of the time it doesnโ€™t return anything if the word I'm looking for is very different from everything in the index. How can I achieve more / more crazy results?

Here is the code I have:

public String[] luceneQuery(String query, int numberOfHits, String path) throws ParseException, IOException { File dir = new File(path); Directory index = FSDirectory.open(dir); query = query + "~"; Query q = new QueryParser(Version.LUCENE_43, "label", analyzer) .parse(query); IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); Query fuzzyQuery = new FuzzyQuery(new Term("label", query), 2); ScoreDoc[] fuzzyHits = searcher.search(fuzzyQuery, numberOfHits).scoreDocs; String[] fuzzyResults = new String[fuzzyHits.length]; for (int i = 0; i < fuzzyHits.length; ++i) { int docId = fuzzyHits[i].doc; Document d = searcher.doc(docId); fuzzyResults[i] = d.get("label"); } reader.close(); return fuzzyResults; } 
+6
source share
1 answer

Long distance editing is no longer supported by FuzzyQuery in Lucene 4.x. The current FuzzyQuery implementation is a huge performance improvement from the Lucene 3.x implementation, but only supports two changes. Distances exceeding 2 Damerau-Levenshtein changes are considered rarely useful.

According to the FuzzyQuery documentation , if you really should have higher editing distances:

If you really want this, consider using the n-gram indexing method (for example, SpellChecker in the proposed module).

The strong consequence is that you have to rethink what you are trying to accomplish and find a more useful approach.

+4
source

Source: https://habr.com/ru/post/949824/


All Articles