Unable to speak with ZooKeeper. Updates are disabled.

We have one particular problem with Zoo Keeper, in which ZK suddenly loses touch with solr cloud and starts throwing an exception that says "Unable to talk to ZooKeeper - Updates are disabled."

Our application has 2 installations of solr-clusters separately in two different data centers. Each of these clusters has the same configuration and data and is expected to take the same incremental load. Application users need their changes to reflect in the search with an almost immediate effect, and therefore we run an incremental load every 10 seconds. Having said that data updates in these 10 seconds will not exceed 10,000 in ideal scenarios.

3 Zoo keepers are configured in a quorum with dedicated servers for each data center. Now, with such a set, we recently encountered the problem mentioned earlier in one of the data centers. ZK suddenly drops and does not recover on its own. Oddly enough, this happened only on one data center, while both DCs have the same load.

Although this did not affect the index search, it bombarded the application team with failure notifications (due to the installation of application-specific notifications).

What have we done for this? A: To stop the flood of letters, we stopped incremental tasks for about 5 minutes, and then resumed them.

What are we watching? (Perhaps a misunderstanding was also corrected) A: Stopping the tasks allowed ZK to someday restore itself, which allowed it to resume additional operations when resuming. No restart of ZK Solr Cloud required.

What would we like to know? Q: There was nothing unusual in terms of ZK overloading over time. Then what could make ZK turn itself off?

It would be very helpful if someone could help me understand the reason for this unexpected behavior.

Thanks in advance!

+6
source share

Source: https://habr.com/ru/post/983081/


All Articles