How to handle Apache Curator Distributed Connection Loss Lock

Let's say I have two distributed processes that execute the following code, which uses a zookeeper and a curator for general blocking:

public static void main(String[] args) throws Exception { CuratorFramework client = CuratorFrameworkFactory.newClient("localhost:2181", new ExponentialBackoffRetry(500, 2)); client.start(); InterProcessMutex lock = new InterProcessMutex(client, "/12345"); System.out.println("before acquire"); lock.acquire(); System.out.println("lock has been acquired"); //do some things that need to be done in an atomic fashion lock.release(); System.out.println("after release"); } 

If the comment "do some things" is a few statements that must be executed by only one process at a time. for example, rewriting to various databases.

All this seems fine until one of the java processes loses connection with zookeeper after it has acquired a lock.

According to the documentation:

It is highly recommended that you add a ConnectionStateListener and keep track of SUSPENDED and LOST changes. If the SUSPENDED state has reported that you cannot be sure that you are still holding the lock, unless you then get the RECONNECTED state. If LOST status is reported, you no longer hold the lock.

If I understood this correctly, at any time after acquiring the lock, I could receive a notification that the lock was lost due to a network problem, after which some other process could get the lock. If this is true, there is no guarantee that after acquiring a lock, you are the only process that has a lock. My precious statements, which should be executed by only one process at a time, may alternate with another process.

Did I misunderstand this? If yes, please specify what this means. If I have not misunderstood the above, how useful is the curator if he cannot guarantee exclusive access?

+5
source share
1 answer

This is a general rule of distributed systems: the network and other instances are unstable. If your instance loses contact with the ZooKeeper ensemble, you cannot be sure of the state of your node lock. This is what it means to receive a SUSPENDED connection state change. Inside, ZooKeeper notified the Curator that the connection to his ZooKeeper instance was lost.

This suggests that you can safely assume that no other instance will receive a lock until your session expires, so you do a few for you. Note that the status value of the LOST connection has changed in the 3.x curator. Prior to curator 3.x, a LOST state meant that your retry policy has expired. In 3.x, the curator now sets an internal timer when the SUSPENDED connection and the status of the LOST connection mean that the session has expired. Thus, for many applications, you can safely ignore SUSPENDED and only exit your lock when receiving LOST.

All this aside. Even using JDK locking in the same JVM, you should be able to handle thread interruption. Having the application descriptor for the SUSPENDED / LOST curator is the same semantically.

Hope this helps (note that I'm the main author of Apache Curator)

+7
source

Source: https://habr.com/ru/post/1260999/


All Articles