Sort an array of search fields in an ElasticSearch document by relevance

I have an ElasticSearch index that looks something like this:

{ "mappings": { "article": { "properties": { "title": { "type": "string" }, "tags": { "type": "keyword" }, } } } 

And data that looks something like this:

 { "title": "Something about Dogs", "tags": ["articles", "dogs"] }, { "title": "Something about Cats", "tags": ["articles", "cats"] }, { "title": "Something about Dog Food", "tags": ["articles", "dogs", "dogfood"] } 

If I am looking for a dog , I will receive the first and third documents, as expected. And I can weigh the search documents as I like (in fact, I use the function_score query to weigh on a bunch of fields that are not related to this question).

What I would like to do is sort the tags field so that the most relevant tags are returned first without affecting the sort order of the documents themselves. Therefore, I hope to get this result:

 { "title": "Something about Dog Food", "tags": ["dogs", "dogfood", "articles"] } 

Instead of what I am getting now:

 { "title": "Something about Dog Food", "tags": ["articles", "dogs", "dogfood"] } 

The sort documentation and function evaluation don't cover my case. Any help appreciated. Thanks!

+5
source share
2 answers

You cannot sort the _source (your tag array) of documents based on its "correspondence". One way to do this is to use nested fields and inner_hits , which allows you to sort the corresponding nested fields.

My suggestion is to convert your tags to a nested field (I selected keyword there just for simplicity, but you can also have text and an analyzer of your choice):

 PUT test { "mappings": { "article": { "properties": { "title": { "type": "string" }, "tags": { "type": "nested", "properties": { "value": { "type": "keyword" } } } } } } } 

And use this kind of query:

 GET test/_search { "_source": { "exclude": "tags" }, "query": { "bool": { "must": [ { "match": { "title": "dogs" } }, { "nested": { "path": "tags", "query": { "bool": { "should": [ { "match_all": {} }, { "match": { "tags.value": "dogs" } } ] } }, "inner_hits": { "sort": { "_score": "desc" } } } } ] } } } 

Where are you trying to match the value of the nested field tag for the same text that you are trying to match with the heading. Then, using inner_hits sorting, you can sort the nested values ​​based on their internal scoring.

@Val's suggestion is very good, but it's good so far for your "relevant tags" you are fine with simple text matching as a substring ( i1.indexOf(params.search) ). Its main advantage is that you do not need to change the display.

My solution is the big advantage is that you really use the true Elasticsearch search capabilities to define "relevant" tags. But the disadvantage is that you need a nested field instead of the usual simple keyword .

+5
source

What you get from the search call is the source documents. Documents in the response are returned in the same way as when they are indexed, which means that if you index ["articles", "dogs", "dogfood"] , you will always get this array in this unchanged form.

One way around this is to declare a script_field that uses a small script to sort your array and return a result of this kind.

What the script does is simply move the terms containing the search query to the top of the list

 { "_source": ["title"], "query" : { "match_all": {} }, "script_fields" : { "sorted_tags" : { "script" : { "lang": "painless", "source": "return params._source.tags.stream().sorted((i1, i2) -> i1.indexOf(params.search) > -1 ? -1 : 1).collect(Collectors.toList())", "params" : { "search": "dog" } } } } } 

This will return something like this, as you can see that the sorted_tags array contains terms as you expect.

 { "took": 18, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "tests", "_type": "article", "_id": "1", "_score": 1, "_source": { "title": "Something about Dog Food" }, "fields": { "sorted_tags": [ "dogfood", "dogs", "articles" ] } } ] } } 
+2
source

Source: https://habr.com/ru/post/1272862/


All Articles