Facet Counting Issues

I'm trying to use ElasticSearch for analytics - specifically for tracking "top content" for manual CMS Rails. The requirement is quite a bit more complicated than storing a counter for each piece of content. Now I will not delve into the essence of the problem, since I cannot understand how the basics work.

My problem is this: I use faces, and the calculations are not what I expect from them. For instance:

Query:

{"facets":{"el_ids":{"terms":{"field":"el_id","size":1,"all_terms":false,"order":"count"}}}} 

Result:

 {"el_ids":{"_type":"terms","missing":0,"total":16672,"other":16657,"terms":[{"term":"quis","count":15}]}} 

Ok, fine, the piece of content with id "quis" had 15 hits, and since order is count , this should be my top piece of content. Now let's get the top 5 content.

Query:

 {"facets":{"el_ids":{"terms":{"field":"el_id","size":5,"all_terms":false,"order":"count"}}}} 

Result (facet only):

 [ {"term":"qgz9","count":26}, {"term":"quis","count":15}, {"term":"hnqn","count":15}, {"term":"higp","count":15}, {"term":"csns","count":15} ] 

A? So the piece of content with id "qgz9" had more hits with 26? Why was he not the best result in the first query?

Ok, now you can get the top 100.

Query:

 {"facets":{"el_ids":{"terms":{"field":"el_id","size":100,"all_terms":false,"order":"count"}}}} 

Results (facet only):

 [ {"term":"qgz9","count":43}, {"term":"difc","count":37}, {"term":"zryp","count":31}, {"term":"u65r","count":31}, {"term":"sxsi","count":31}, ... ] 

So now "qgz9" has 43 hits instead of 26? How can it be? I can assure you that nothing happens in the background by changing the index. If I repeat these queries, I get the same results.

As I repeat this process of increasing the size of the result, the quantity continues to change, and new vertices of the content appear at the top. Can someone explain to me what I'm doing wrong or where my understanding of how this works is wrong?

+6
source share
1 answer

Turns out this is a known issue :

... the way the upper N-faces work is to get the vertex N from each shard and merge the results. This may lead to inaccurate results.

By default, my index is created with 5 shards. By changing this, the index has only one shard, the calculations behave in accordance with my expectations. Another workaround would always be to set size to a value greater than the number of expected faces, and to capture the top results of N.

+7
source

Source: https://habr.com/ru/post/919932/


All Articles