Lucene: Compare Query Results

I need to compare the relevance of search results for various Lucene queries.

In fact, I have an indexed set of text documents, and when the search is performed in this set, I want to return not the best results from this set, but all the results that match the query "good enough".

This "good enough" option will be tuned (say, between 0 (the document is completely irrelevant) and 1 (the document is the best match)), but I want it to affect all requests in the same way.

From what I found on the Internet, this is not an easy task. Can someone give me a hint on how to approach this problem?

Thanks a lot!

+6
source share
3 answers

Even if you normalize scores to the interval [0,1], it seems incorrect to compare many different queries, see How to normalize Lucene scores?

+1
source

I was just looking for the answer to the same question. Here is what I found looking around:

While it is generally impossible to compare between queries , if you have certain limited query types, for example, BooleanQuery only TermQuery s, then it is possible to compare the results by query, if you disable coordination enhancement in the BooleanQuery constructor .

0
source

If you want to compare two or more queries, I have found a workaround. You can compare your highest clogged document with your queryterm using the LevenstheinDistance or LuceneLevenstheinDistance (Damerau) class to get the distance between your queryterm and your result.

The result is a similarity between them. Do this for each query that you want to compare with. Now you have a tool to compare your queries using the similarity of your query and the maximum result. Now you can select the query with the highest affinity score and use it for the following correct actions.

 //Damerau LevenstheinDistance LuceneLevenshteinDistance d = new LuceneLevenshteinDistance(); similiarity = d.getDistance(queryterm, yourResult ); 
0
source

Source: https://habr.com/ru/post/893440/


All Articles