Aggregate field in elasticsearch-dsl using python

Can someone tell me how to write Python instructions that will aggregate (summarize and count) information about my documents?


SCRIPT

from datetime import datetime from elasticsearch_dsl import DocType, String, Date, Integer from elasticsearch_dsl.connections import connections from elasticsearch import Elasticsearch from elasticsearch_dsl import Search, Q # Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="attendance") s = s.execute() for tag in s.aggregations.per_tag.buckets: print (tag.key) 

OUTPUT

 File "/Library/Python/2.7/site-packages/elasticsearch_dsl/utils.py", line 106, in __getattr__ '%r object has no attribute %r' % (self.__class__.__name__, attr_name)) AttributeError: 'Response' object has no attribute 'aggregations' 

What causes this? Invalid keyword "aggregations"? Is there any other package I need to import? If the "attendance" index has a field called emailAddress, how can I calculate which documents matter for this field?

+6
source share
1 answer

Primarily. Now I notice that what I wrote here is not really defined. The documentation on how to use this is not readable for me. Using what I wrote above, I will expand. I am changing the name of the index to make a more pleasant example.

 from datetime import datetime from elasticsearch_dsl import DocType, String, Date, Integer from elasticsearch_dsl.connections import connections from elasticsearch import Elasticsearch from elasticsearch_dsl import Search, Q # Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="airbnb", doc_type="sleep_overs") s = s.execute() # invalid! You haven't defined an aggregation. #for tag in s.aggregations.per_tag.buckets: # print (tag.key) # Lets make an aggregation # 'by_house' is a name you choose, 'terms' is a keyword for the type of aggregator # 'field' is also a keyword, and 'house_number' is a field in our ES index s.aggs.bucket('by_house', 'terms', field='house_number', size=0) 

Above, we create 1 bucket per house number. Therefore, the name of the bucket will be the house number. ElasticSearch (ES) will always indicate the number of documents matching this bucket. Size = 0 means using all results, since ES has a default value to return only 10 results (or regardless of what its developer set to execute).

 # This runs the query. s = s.execute() # let see what in our results print s.aggregations.by_house.doc_count print s.hits.total print s.aggregations.by_house.buckets for item in s.aggregations.by_house.buckets: print item.doc_count 

My mistake used to suggest that the Elastic Search query had aggregation by default. You define them yourself, and then execute them. Then your answer can be divided into the aggregators that you mentioned.

CURL for the above should look like this:
NOTE. I am using the SENSE ElasticSearch plugin / extension / add-on for Google Chrome. In SENSE, you can use // for comments.

 POST /airbnb/sleep_overs/_search { // the size 0 here actually means to not return any hits, just the aggregation part of the result "size": 0, "aggs": { "by_house": { "terms": { // the size 0 here means to return all results, not just the the default 10 results "field": "house_number", "size": 0 } } } } 

Work around. Someone from the GIT DSL told me to forget to translate and just use this method. It is simpler and you can just write hard stuff in CURL. That is why I call it a workaround.

 # Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="airbnb", doc_type="sleep_overs") # how simple we just past CURL code here body = { "size": 0, "aggs": { "by_house": { "terms": { "field": "house_number", "size": 0 } } } } s = Search.from_dict(body) s = s.index("airbnb") s = s.doc_type("sleepovers") body = s.to_dict() t = s.execute() for item in t.aggregations.by_house.buckets: # item.key will the house number print item.key, item.doc_count 

Hope this helps. Now I design everything in CURL, and then use the Python statement to clear the results to get what I want. This helps for multi-level aggregations (sub-aggregations).

+17
source

Source: https://habr.com/ru/post/984530/


All Articles