I have to solve a problem that conveys my very basic elasticsearch know-how.
I have a set of objects - each of them has a set of tags. How:
obj_1 = ["a", "b", "c"] obj_2 = ["a", "b"] obj_3 = ["c", "b"]
I want to search for objects using weighted tags . For instance:
search_tags = {'a': 1.0, 'c': 1.5}
I want search tags to be an OR query. That is - I do not want to exclude documents that do not have all the requested tags. But I want them to be ordered for those who have more weight (sort of: each matching tag times its weight).
Using the above example, the order of the returned ducuments will look like this:
- obj_1 (rating: 1.0 + 1.5)
- obj_3 (rating: 1.5)
- obj_2 (rating: 1.0)
What is the best approach to this regarding the structure of the document and the correct way to query ES?
There is a similar question here: Sustainable search - the power of tags (nested / child documentation) , but I donβt want to specify the weight when indexing - I want this to be done when searching.
My current setup is as follows.
Objects:
[ "title":"1", "tags" : ["a", "b", "c"], "title":"2", "tags" : ["a", "b"], "title":"3", "tags" : ["c", "b"], "title":"4", "tags" : ["b"] ]
And my request:
{ "query": { "custom_filters_score": { "query": { "terms": { "tags": ["a", "c"], "minimum_match": 1 } }, "filters": [ {"filter":{"term":{"tags":"a"}}, "boost":1.0}, {"filter":{"term":{"tags":"c"}}, "boost":1.5} ], "score_mode": "total" } } }
The problem is that it only returns objects 1 and 3. Should it match object 2 (has the tag βaβ), or am I doing something wrong?
UPDATE AS OFFERED
Ok Raised to script to calculate the minimum. Minimum match removed. My request:
{ "query": { "custom_filters_score": { "query": { "terms": { "tags": ["a", "c"] } }, "filters": [ {"filter":{"term":{"tags":"a"}}, "script":"1.0"}, {"filter":{"term":{"tags":"c"}}, "script":"1.5"} ], "score_mode": "total" } } }
Answer:
{ "_shards": { "failed": 0, "successful": 5, "total": 5 }, "hits": { "hits": [ { "_id": "3", "_index": "test", "_score": 0.23837921, "_source": { "tags": [ "c", "b" ], "title": "3" }, "_type": "bit" }, { "_id": "1", "_index": "test", "_score": 0.042195037, "_source": { "tags": [ "a", "b", "c" ], "title": "1" }, "_type": "bit" } ], "max_score": 0.23837921, "total": 2 }, "timed_out": false, "took": 3 }
The wrong order still occurs and one result is missing. obj_1 must be before obj_3 (since it has both tags), and obj_2 is still missing. How can it be?