Why not use min_score with Elasticsearch?

New in Elasticsearch. I am only interested in returning the most relevant documents and met min_score . They say: "Note, in most cases this does not make much sense," but it does not give a reason. So why doesn't it make sense to use min_score?

EDIT: what I really want to do is only documents with a higher value than "rating". I have it:

data = { 'min_score': 0.9, 'query': { 'match': {'field': 'michael brown'}, } } 

Is there a better alternative to the above so that it only returns the most relevant documents?

THX!

EDIT No. 2: I use minimum_should_match and it returns error 400:

 "error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed;" data = { 'query': { 'match': {'keywords': 'michael brown'}, 'minimum_should_match': '90%', } } 
+6
source share
2 answers

I very often used min_score to search for documents that are the final match for a given set of input data, which is used to generate a query.

Of course, the rating you get for the document depends on the request. Therefore, I would say, try your query in many permutations (for example, for different keywords) and decide which document is the first, you would prefer that it not be returned for everyone, and do not read each of your ratings. If the ratings are similar, this will give you a good idea of ​​the value that will be used for your minimum score.

However, you need to keep in mind that the assessment does not depend only on the request and the returned document, it considers all other documents that have data for the fields that you request. This means that if you check the min_score value with an index of 20 documents, this indicator will probably change a lot if you try it using the production index, for example, several thousand documents or more. This change can go anyway and is not easy to predict.

I found the use of min_score for my comparisons, you need to create a rather complex query and a set of analyzers to configure points for various components of your request. But what is included and not included is vital to my application, so you can be happy with what it gives you when you make things simple.

+4
source

I don't know if this is the best solution, but it works for me (java):

 // "tiny" search to discover maxScore // it is fast, because it returns only 1 item SearchResponse response = client.prepareSearch(INDEX_NAME) .setTypes(TYPE_NAME) .setQuery(queryBuilder) .setSize(1) .execute() .actionGet(); // get the maxScore and // and set minScore = 70% float maxScore = response.getHits().maxScore(); float minScore = maxScore * 0.7; // second round with minimum score SearchResponse response = client.prepareSearch(INDEX_NAME) .setTypes(TYPE_NAME) .setQuery(queryBuilder) .setMinScore(minScore) .execute() .actionGet(); 

I do a search twice, but for the first time quickly, because it returns only 1 element, then we can get max_score

NOTE: minimum_should_match work different. If you have 4 requests and you say that minimum_should_match = 70%, this does not mean that item.score should be> 70%. This means that the element must match 70% of the queries, which is a minimum of 3/4 queries

+2
source

Source: https://habr.com/ru/post/974867/


All Articles