How to quickly update ElasticSearch?

I have an ElasticSearch index with about 200M documents, the total size of the index is 90Gb.

I changed the display, so I would like ElasticSearch to re-index all the documents.

I wrote a script that creates a new index (with a new mapping), then iterates over all the documents in the old index and puts it in the new one.

This seems to work, but the problem is that it works very slowly. It started with 300 documents per minute two days ago, and now the speed is 150 documents per minute.

The script runs on a machine on the same network that uses machines to look for elasticity.

At this rate, it takes a month to complete the re-index.

Does anyone know of a faster technique to reindex an elastic search index?

+6
source share
2 answers

Answered in google groups:

Option A: Use mass index operations.

Option B: Use the re-index plugin that runs inside the ES machine: https://github.com/karussell/elasticsearch-reindex

+4
source

The correct way to reindex with Elasticsearch is to use the scan and scroll API, which must be supported by Pyes.

Pyes library seems to have a reindex method, but I have no experience with it.

(If you use Ruby over Python :), the Tire Ruby client has an Index#reindex : https://github.com/karmi/tire/blob/master/test/integration/reindex_test.rb , It should be fast enough for your data.)

0
source

Source: https://habr.com/ru/post/946887/


All Articles