Wrote a blog post about how I recently handled reindexing without downtime. It takes some time to figure out all the little things that must be in place to do this. Hope this helps!
https://summera.imtqy.com/infrastructure/2016/07/04/reindexing-elasticsearch.html
Summarizing:
Step 1: Prepare a New Index
Create a new index with a new mapping. It can be in one instance of Elasticsearch or in a new instance.
Step 2: Saving Indexes Until Date
While you are reindexing, you want to update your new and old indexes. For a write operation, this can be done by sending a write operation to the background worker for both the new and the old index.
Removing is a bit more complicated because there is a race condition between deleting and reindexing a record into a new index. This way, you will need to keep track of the records you need to delete during your reindex, and process them when you are done. If you do not perform many deletions, another way would be to eliminate the possibility of deletion during your reindex.
Step 3: Re-Index
You want to use scrolled search to read data and bulk API to insert. Since after step 2 you will be writing new and updated documents to the new index in the background, you want you to not update existing documents in the new index using mass API requests.
This means that the operation you want to use for bulk API requests is created, not indexed. From the documentation : “create will fail if a document with the same index and type already exists, while the index will add or replace the document as necessary,” The main thing here is you don’t want the old data from the scrolled search to overwrite the new data in the new index.
There is a great script on github to help you in this process: es-reindex .
Step 4: Switch
Once you're done reindexing, it's time to switch your search to a new index. You will want to enable deletion again or process the specified deletion tasks for the new index. You may notice that finding a new index is a bit slow at first. This is because Elasticsearch and the JVM need time to warm up.
Make the necessary code changes to get your application to search for a new index. You can continue writing to the old index if you encounter problems and need a rollback. If you think this is not necessary, you can stop writing to it.
Step 5: Cleaning
At this point, you should completely transition to the new index. If everything goes well, perform any necessary cleaning, for example:
- Delete the old index host if it is different from the new
- Delete serialization code associated with your old index