How to get the number of documents to be added if you select other parameters of the aggregation of the array field in the elastic search

Let's say we have four documents with the tags field. It can contain several lines, for example foo , bar and baz .

 docA.tags = ['foo'] docB.tags = ['bar'] docC.tags = ['foo', 'bar'] docD.tags = ['foo', 'baz'] 

I request documents using aggregations, so I get four documents and a list of three codes with a count corresponding to a specific tag.

 buckets = [ {key: 'bar', doc_count: 2}, // docB, docC {key: 'foo', doc_count: 3}, // docA, docC, docD {key: 'baz', doc_count: 1} // docD ] 

If I run another request now and add one of these tags - say, foo - as a filter of conditions for the request, I get only documents ( docA , docC , docD )) there is this tag. This is what I want.

But I also get another list of possible aggregates with updated counts.

 buckets = [ {key: 'bar', doc_count: 1}, // docC {key: 'baz', doc_count: 1}, // docD ] 

But these calculations really do not match what is happening. They reflect the number of documents that match both tags that I selected first ( foo ) and one of the buckets ( bar or baz ).

But if I then select the second tag - say, baz - I get the documents marked with foo OR baz . This is because I use the terms filter.

So what I really want is

 buckets = [ {key: 'bar', doc_count: 1}, //docB {key: 'baz', doc_count: 0}, ] 

How can I achieve that the calculations are appropriate. They should reflect the number of documents to be added if I select the second tag. An example of this is here .

I already tried using post_filter , but that always gives me the first result. Than a min_doc_count -flag for aggs, but it only shows me combinations that will lead to count=0 .

I have a solution for this, but it seems to me rather complicated. To do this, I would have to run another query for each unit, where I invert the filter criteria. Therefore, in the above example, I have to make a request to all documents that do not have the foo tag and correspond to the rest of the request. The aggregation results will be exactly what I need.

+5
source share
1 answer

It looks like you are trying to do something a little atypical for faces / clusters.

(However, it is not invalid ... it makes sense to understand how the size of your selection will change using the filter application)

I think you are asking:

  • Showing results for: QUERY AND FILTER
  • The expression aggregation index is calculated for: QUERY NOT FILTER

enter image description here

Did you mention that you are making a follow-up request for counting? You should be able to build this aggregation query in your main search query.

Structurally, this is:

  • match: ( QUERY ) or match_all
  • aggregation:
    • filter : {not: ( FILTER )}
      • aggregations: {terms: ...}
  • post_filter : ( FILTER )

This post_filter is executed after the aggregated calculations (but still apply to the search results), so your results will be what you expect.

Aggregates work only as part of a search query. (Postfilter has not yet been applied.)

filter aggregation excludes all documents matching FILTER from the search query results before aggregation of conditions calculates the counts.

(gives you the left outer edge of the venn shown above, but only for calculations)

0
source

Source: https://habr.com/ru/post/1236731/


All Articles