Is there a smarter way to reinstall elasticsearch?

I ask because our search is in a state of change, since we are working on it, but every time we make changes to the index (change the tokenizer or filter or the number of skulls / replicas), we need to blow off the whole index and re-index all of our Rails models back to Elasticsearch ... that means we have to consider downtime to reindex all of our records.

Is there a smarter way to do this that I don't know about?

+42
ruby-on-rails elasticsearch
Dec 13
source share
4 answers

I think @karmi is doing everything right. However, let me explain it a little easier. I sometimes had to update the production scheme with some new properties or analysis parameters. I recently started using the script below to migrate a constant, constant load, and zero downtime index. You can do it remotely.

Here are the steps:

Assumption:

  • You have an index real1 and aliases real_write , real_read pointing to it,
  • the client writes only real_write and reads only from real_read ,
  • _source is a document property available.

1. New index

Create a real2 index with the new mapping and settings of your choice.

2. Writer alias switch

Use the following aliases to redirect request requests.

 curl -XPOST 'http://esserver:9200/_aliases' -d ' { "actions" : [ { "remove" : { "index" : "real1", "alias" : "real_write" } }, { "add" : { "index" : "real2", "alias" : "real_write" } } ] }' 

This is an atomic operation. Since that time, real2 filled with new client data on all nodes. Readers still use the old real1 through real_read . This is the final sequence.

3. Old data migration

Data must be transferred from real1 to real2 , however new documents in real2 cannot be overwritten with old records. The migration script should use the bulk API with the create operation (not index or update ). I am using a simple Ruby script es-reindex that has good ETA Status:

 $ ruby es-reindex.rb http://esserver:9200/real1 http://esserver:9200/real2 

UPDATE 2017 Instead of using a script, you can consider the new Reindex API . It has many interesting features, such as conflict reports, etc.

4. Alias ​​Reader

Now real2 updated and clients write to it, however they are still reading from real1 . Let the reader alias be updated:

 curl -XPOST 'http://esserver:9200/_aliases' -d ' { "actions" : [ { "remove" : { "index" : "real1", "alias" : "real_read" } }, { "add" : { "index" : "real2", "alias" : "real_read" } } ] }' 

5. Backing up and deleting the old index

Writes and reads the transition to real2 . You can archive and delete the real1 index from the ES cluster.

Done!

+68
Jul 03 '13 at 11:15
source share

Yes, there are more reasonable ways to reindex your data without downtime.

First, never, never use the "final" index name as your real index name. So, if you want to name your "articles" of your index, do not use this name as a physical index, but create an index such as "articles-2012-12-12" or "articles-A", "articles -1" and etc.

Second, create an alias that points to this index. Then your application will use this alias, so you do not have to manually change the index name, restart the application, etc.

Thirdly, when you need or need to reindex data, reindex it into another index, say, "articles-B" - all the tools in the Tire indexing tools support you here.

When you're done, specify an alias on the new index. Thus, you not only minimize downtime (it is not), you also have a safe snapshot: if you somehow ruined the indexing into the new index, you can just go back to the old one until you solve the problem.

+30
Dec 13 '12 at 9:21
source share

Wrote a blog post about how I recently handled reindexing without downtime. It takes some time to figure out all the little things that must be in place to do this. Hope this helps!

https://summera.imtqy.com/infrastructure/2016/07/04/reindexing-elasticsearch.html

Summarizing:

Step 1: Prepare a New Index

Create a new index with a new mapping. It can be in one instance of Elasticsearch or in a new instance.

Step 2: Saving Indexes Until Date

While you are reindexing, you want to update your new and old indexes. For a write operation, this can be done by sending a write operation to the background worker for both the new and the old index.

Removing is a bit more complicated because there is a race condition between deleting and reindexing a record into a new index. This way, you will need to keep track of the records you need to delete during your reindex, and process them when you are done. If you do not perform many deletions, another way would be to eliminate the possibility of deletion during your reindex.

Step 3: Re-Index

You want to use scrolled search to read data and bulk API to insert. Since after step 2 you will be writing new and updated documents to the new index in the background, you want you to not update existing documents in the new index using mass API requests.

This means that the operation you want to use for bulk API requests is created, not indexed. From the documentation : “create will fail if a document with the same index and type already exists, while the index will add or replace the document as necessary,” The main thing here is you don’t want the old data from the scrolled search to overwrite the new data in the new index.

There is a great script on github to help you in this process: es-reindex .

Step 4: Switch

Once you're done reindexing, it's time to switch your search to a new index. You will want to enable deletion again or process the specified deletion tasks for the new index. You may notice that finding a new index is a bit slow at first. This is because Elasticsearch and the JVM need time to warm up.

Make the necessary code changes to get your application to search for a new index. You can continue writing to the old index if you encounter problems and need a rollback. If you think this is not necessary, you can stop writing to it.

Step 5: Cleaning

At this point, you should completely transition to the new index. If everything goes well, perform any necessary cleaning, for example:

  • Delete the old index host if it is different from the new
  • Delete serialization code associated with your old index
+3
Jul 07 '16 at 15:30
source share

Maybe create another index and reindex all the data on it, and then make a switch when it will be re-indexed?

+2
Dec 13
source share



All Articles