We introduce some examples of data:
curl -XPUT localhost:9200/dt/art/1 -d '{ "basket1": ["banana","apple", "peach"],
"basket2": ["banana", "orange"]}'
curl -XPUT localhost:9200/dt/art/2 -d '{ "basket1":["orange", "apple"],
"basket2":[]}'
curl -XPUT localhost:9200/dt/art/3 -d '{ "basket1": ["apple"],
"basket2": ["apple", "banana"]}'
curl -XPUT localhost:9200/dt/art/4 -d '{ "basket1:["banana"],
"basket2": ["banana"]}'
I need to find the article number where the “banana” was found in basket1 but not in basket2. and repeat this for all values in basket1. For example, I would expect here:
banana : 0
apple: 2
peach: 1
orange: 1
So far, the only solution I have found is to sub-aggregate:
body = {
"query": {
"match_all": {}},
"aggs": {
"basket1_count": {
"terms":{
"field": basket1},
"aggs": {
"basket2_count": {
"terms":{
"field": basket2,
"size": 10000}}}}}}
Then I would stay in python. (the difference between the number of documents where the value in field 1 is found and the number of documents where the value in field 2 is found)
Is there a better way to do this?
source
share