I am doing map clustering using Geosystems grid aggregation in Elasticsearch. The request is returned on average for 100-200 buckets. Each bucket uses top_hits aggregation, which I use to return 3 documents for each aggregated cluster.
The problem is that I want to return top_hits only when the parent aggregation (GeoHash) combines no more than three documents.
If a cluster combines more than three documents, I do not want the ES to return any documents for this cluster (because I will not use them).
I tried using Bucket Selector Aggregation , but failed to create the correct bucket_path. I use bucket selector aggregation at the same level as top_hits aggregation. The number of shared documents for the bucket is available in top_hits.hits.total
, but I get reason=path not supported for [top_hits]: [hits.total]
.
Is this possible in elasticsearch? This is important to me because in most queries, only a small percentage of buckets will have less than three documents. But the top of the sub-aggregation always returns the top 3 documents, even for clusters of 1000 documents. If the query result returns 200 buckets, and only 5 of them aggregate <= 3 documents, so I want to return only 5 * 3 documents, not 200 * 3 (then the Te response in this case is 10 MB).
Here is part of my request:
"clusters": { "geohash_grid": { "field": "coordinates", "precision": 3 }, "aggs": { "top_hits": { "top_hits": { "size": 3 } }, "top_hits_filter": { "bucket_selector": { "buckets_path": { "total_hits": "top_hits._count" // tried top_hits.hits.total }, "script": { "inline": "total_hits <= 3" } } } } }
source share