Cross data center strategy in elasticsearch

For on-demand backup, we have 2 clusters with the same data. One of them is primary production, which fails. What are my best opportunities for real-time replication across one cluster to another? In this case, even if one cluster fails, we must immediately move to another. Can we use replicas for the same?

+6
source share
4 answers

Elasticsearch poorly supports cross-data center replication. But then one approach that we tried looks as follows, and it is great for its size. From one data center we took a picture of the ES cluster on S3 and from another data center, we are doing a recovery from the same S3. We do this at regular intervals to make sure that we get consistent data in both data centers. Since the snapshot / restore is incremental in nature and therefore, it is suitable for this problem. This allows you to transfer only new data to another data center. Although this is not real time in nature, it still sets the score for us.

+5
source

Elasticsearch really does not have a special internetworking center replication feature. Replication is synchronous, so it is far from ideal, because increasing latency can cause problems.

However, people use a general awareness of placement to implement such a setting. Take a look at this cross-cutting: https://crate.io/docs/en/latest/best_practice/multi_zone_setup.html

Elasticsearch documentation will also help, but will be aware of potential issues: http://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html#_network

+4
source

What you want is described on the ElasticSearch blog on clustering across multiple datacenters.

you will have the code of your application for writing to the replicated queue system (for example, Kafka, Redis, RabbitMQ) and have a process (for example, Logstash) in each DC reading from the corresponding queue and indexing documents in the local Elasticsearch cluster

Please note that at the time you asked this question, this document did not exist yet. I just came across this during my own research on this issue. It would be great to hear about other experiences with this approach. Greetings.

+2
source

If you need real-time synchronization between two clusters, perform all operations performed in one cluster on the second. Meaning, your application or clients that access one cluster must also access the second. This would be the best approach for real-time data synchronization in both clusters.

Otherwise, if you don't care, if you miss some updates, what @Vineeth Mohan said is the way to go.

+1
source

Source: https://habr.com/ru/post/986490/


All Articles