Elasticsearch aggregates filtering top result from each bucket

Given a dataset like this in a single index in Elasticsearch:

  entityId |  created |  status
 --------- + ------------ + -----------
 1 |  2000/01/01 |  draft
 1 |  2001/01/02 |  approved
 2 |  2000/01/01 |  draft
 2 |  2000/01/02 |  approved
 2 |  2001/01/03 |  rejected
 3 |  2000/01/01 |  draft
 3 |  2001/01/03 |  approved 

I want to filter only objects in which the new status is approved.

So, I tried with aggregations and sub-aggregates, and I managed to get all entities with only the newest status, included like this:

{ "size": 0, "aggs": { "newest-event-query": { "terms": { "field": "entityId" }, "aggs": { "newest-event": { "top_hits": { "size": 1, "sort": [ { "created": { "order": "desc" } } ] } } } } } } 

What should happen like this:

  entityId |  created |  status
 --------- + ------------ + -----------
 1 |  2001/01/02 |  approved
 2 |  2001/01/02 |  rejected
 3 |  2001/01/03 |  approved 

But I would like to filter this result to include only the approved records (1, 3), and then, finally, I was able to request this result.

Adding additional aggs to top_hits agents does not work:

 { "size": 0, "aggs": { "newest-event-query": { "terms": { "field": "entityId" }, "aggs": { "newest-event": { "top_hits": { "size": 1, "sort": [ { "created": { "order": "desc" } } ], "aggs": { "approved-only": { "filter": { "term": { "status": "approved" } } } } } } } } } } 

leads to:

 "error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[gupa9nwpQWmGa3JqFmF2NA][creations][0]: SearchParseException[[creations][0]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][0]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][0]: SearchParseException[[events][0]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][0]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][creations][1]: SearchParseException[[creations][1]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][1]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][1]: SearchParseException[[events][1]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][1]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][creations][2]: SearchParseException[[creations][2]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][2]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][2]: SearchParseException[[events][2]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][2]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][creations][3]: SearchParseException[[creations][3]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][3]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][3]: SearchParseException[[events][3]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][3]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][creations][4]: SearchParseException[[creations][4]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[creations][4]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }{[gupa9nwpQWmGa3JqFmF2NA][events][4]: SearchParseException[[events][4]: from[-1],size[0]: Parse Failure [Failed to parse source [{"size":0,"aggs":{"newest-event-query":{"terms":{"field":"entityId"},"aggs":{"newest-event":{"top_hits":{"size":1,"sort":[{"created":{"order":"desc"}}],"aggs":{"aproved-only":{"filter":{"term":{"status":"approved"}}}}}}}}}}]]]; nested: SearchParseException[[events][4]: from[-1],size[0]: Parse Failure [Unknown key for a START_OBJECT in [newest-event]: [aggs].]]; }]", "status": 400 

Any help was appreciated.

Edit: Filtering by approved will not work, as events can go from approved and return to a different status. I always need to filter by the latest status. The essence of this exercise is to create an invariable data structure - a single object can go through many stages, but we should always request only the last one.

Edit 2: In order to find a solution, I also looked at the parent-child structure, and although it still has some limitations, such as has_parent or has_child, you need to have a fixed identifier. ”Another obvious and effective solution is to simply mark a new element for records - for example, with Boolean, but I want atomicity and reset, which are Boolean on one document, and setting it to a new one, is not an atomic operation.

+5
source share

Source: https://habr.com/ru/post/1204537/


All Articles