ElasticSearch results are not relevant

In ElasticSearch, I created two documents with one CategoryMajor field

In doc1, I set CategoryMajor to "Restaurants"

In doc2, I set CategoryMajor to "Restaurants Restaurants Restaurants Restaurants Restaurants"

If I search for CategoryMajor: Restaurants, doc1 appears as MORE RELATED than doc2. What is not typical behavior of Lucene, which gives more relevance, the more times the term appears. doc2 must be MORE MATTER than doc1.

How can I fix this?

+4
source share
1 answer

You can add & explain = true to your GET request to see that the doc2 score goes down using the fieldNorm factor. This is called the default similarity formula lucene, which reduces the rate for longer documents. Please read this document on the standard lucene affinity formula:

http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html

To disable this behavior, add "omit_norms = true" for the CategoryMajor field to your index transformation by sending a PUT request:

http://localhost:9200/index/type/_mapping 

with request body:

 { "type": { properties": { "CategoryMajor": { "type": "string", "omit_norms": "true" } } } } 

I'm not sure, but you might need to delete your index, create it again, put it above the collation, and then re-index your documents. A prerequisite is reindexing after changing the display :).

+4
source

Source: https://habr.com/ru/post/1433012/


All Articles