How to get all document IDs from elasticsearch index

How to get all document identifiers (internal document "_id") from the Elasticsearch index? if I have 20 million documents in this index, what is the best way to do this?

+3
source share
3 answers

For this amount of documents, you probably want to use the scan and scroll API .

Many client libraries have ready-made helpers for using the interface. For example, using elasticsearch-py you can do:

es = elasticsearch.Elasticsearch(eshost) scroll = elasticsearch.helpers.scan(es, query='{"fields": "_id"}', index=idxname, scroll='10s') for res in scroll: print res['_id'] 
+3
source

I would just export the entire index and read the file system. My experience with the size / from and scan / scroll options was a disaster when working with queries in millions. It just takes too much time.

If you can use a tool such as a backpack, you can export the index to the file system and iterate over directories. Each document is stored in its own directory with the name _id. No need to open files. Just iterate through dir.

backpack reference: https://github.com/jprante/elasticsearch-knapsack

edit: hope you often don't do this ... or it may not be a viable solution.

+2
source

First, you can send a request for the full number of entries in the index.

 curl -X GET 'http://localhost:9200/documents/document/_count?pretty=true' { "count" : 1408, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 } } 

Then you will need to loop through the set using a combination of the size and from parameters until you reach the total. Passing an empty field parameter returns only the index you are interested in and _id.

Find a good page size that you can use without running out of memory, and increment from each iteration.

 curl -X GET 'http://localhost:9200/documents/document/_search?fields=&size=1000&from=5000' 

Response to the request element:

 { "_index" : "documents", "_type" : "document", "_id" : "1341", "_score" : 1.0 }, ... 
0
source

Source: https://habr.com/ru/post/976877/


All Articles