Implement lock detection with Apache ZooKeeper

I work for a small software company and I have been assigned to research a distributed lock manager for us. It should interact with both Java and C ++.

I have been working with ZooKeeper for several weeks, and implemented general locks (read and write locks) in accordance with the documentation. I now need to implement deadlock detection. If every client could maintain a lock schedule, it would be quick and easy. However, you cannot reliably see all the changes that occur with node in ZooKeeper , so maintaining an accurate graph would not be possible. This means that every time I check the deadlock, I will need to load a lot of locks, which seems impractical.

Another solution would be to implement deadlock detection on the ZooKeeper server I'm working on now. Each client would create a node inside '/ expectations', which is named after its session identifier, and its data would block its waiting. Since each castle has an ephemeral owner, I will have enough information to detect a dead end.

The problem is that the ZooKeeper server does not have the synchronization guarantees that the ZooKeeper client has. In addition, the ZooKeeper server is not as well documented as the client, because you usually should not touch it.

So my question is this: how can I implement deadlock detection using Apache ZooKeeper? I see that many people recommend ZooKeeper as a distributed lock manager, but if it cannot support deadlock detection, then no one should use it for this purpose.


EDIT:

I have a working solution. I can not guarantee its correctness, but it passed all my tests.

I use my checkForDeadlock method, which is the heart of the deadlock detection algorithm. Here is the additional information you need to know:

  • Only one client should start blocking detection at a time.
  • First, the client tries to obtain a resource lock. If the resource is already locked and the client wants to wait until it becomes available, then the client will check the deadlock. If waiting for the resource does not lead to a deadlock, then it creates a znode in a special directory that identifies that this client is waiting for this resource. This line looks like this: waitNode = zooKeeper.create(waitingPath + "/" + sessionID, resource.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
  • No other client should start checking for a deadlock until this client creates a wait for a node.
  • If two clients try to get locks almost at the same time, but providing both leads to a deadlock, it is a little possible that instead of the first client receiving the lock and the second client being rejected, the first client could be rejected and the second client may get a lock. This should not be a problem.
  • checkForDeadlock throws a DeadlockException if it encounters a deadlock. Otherwise, it returns normally.
  • Locks are provided in order. If the resource has a read lock and write wait lock, and the other client wants to get a read lock, it must wait until a write lock is granted and then released.
  • bySequenceNumber is a comparator that sorts znodes according to the sequence that ZooKeeper appends to the end of consecutive znodes.

the code:

 private void checkForDeadlock(String pathToResource) throws DeadlockException { // Algorithm: // For each client who holds a lock on this resource: // If this client is me, announce deadlock. // Otherwise, if this client is waiting for a reserved resource, recursively check for deadlock on that resource. try { List<String> lockQueue = zooKeeper.getChildren(pathToResource, false); // Last I checked, children is implemented as an ArrayList. // lockQueue is the list of locks on this resource. // FIXME There is a slight chance that lockQueue could be empty. Collections.sort(lockQueue, bySequenceNumber); ListIterator<String> lockQueueIterator = lockQueue.listIterator(); String grantedLock = lockQueueIterator.next(); // grantedLock is one lock on this resource. do { // lockQueue must contain a write lock, because there is a lock waiting. String lockOwner = null; try { lockOwner = Long.toString(zooKeeper.exists(pathToResource + "/" + grantedLock, false).getEphemeralOwner()); // lockOwner is one client who holds a lock on this resource. } catch (NullPointerException e) { // Locks may be released while I'm running deadlock detection. I got a NullPointerException because // the lock I was currently looking at was deleted. Since the lock was deleted, its owner was obviously // not part of a deadlock. Therefore I can ignore this lock and move on to the next one. // (Note that a lock can be deleted if and only if its owner is not part of a deadlock.) continue; } if (lockOwner.equals(sessionID)) { // If this client is me. throw new DeadlockException("Waiting for this resource would result in a deadlock."); } try { // XXX: Is is possible that reservedResource could be null? String reservedResource = new String(zooKeeper.getData(waitingPath + "/" + lockOwner, false, new Stat())); // reservedResource is the resource that this client is waiting for. If this client is not waiting for a resource, see exception. // I only recursively check the next reservedResource if I havn't checked it before. // I need to do this because, while I'm running my deadlock detection, another client may attempt to acquire // a lock that would cause a deadlock. Without this check, I would loop in that deadlock cycle indefinitely. if (checkedResources.add(reservedResource)) { checkForDeadlock(reservedResource); // Depth-first-search } } catch (KeeperException.NoNodeException e) { // lockOwner is not waiting for a resource. } catch (KeeperException e) { e.printStackTrace(syncOut); } // This loop needs to run for each lock that is currently being held on the resource. There are two possibilities: // A. There is exactly one write lock on this resource. (Any other locks would be waiting locks.) // In this case, the do-while loop ensures that the write lock has been checked. // The condition that requires that the current lock is a read lock ensures that no locks after the write lock will be checked. // B. There are one or more read locks on this resource. // In this case, I just check that the next lock is a read lock before moving on. } while (grantedLock.startsWith(readPrefix) && (grantedLock = lockQueueIterator.next()).startsWith(readPrefix)); } catch (NoSuchElementException e) { // The condition for the do-while loop assumes that there is a lock waiting on the resource. // This assumption was made because a client just reported that it was waiting on the resource. // However, there is a small chance that the client has since gotten the lock, or even released it before // we check the locks on the resource. // FIXME (This may be a problem.) // In such a case, the childrenIterator.next() call could throw a NoSuchElementException. // We can safely assume that we are finished searching this branch, and therefore return. } catch (KeeperException e) { e.printStackTrace(syncOut); } catch (InterruptedException e) { e.printStackTrace(syncOut); } } 
+6
source share
1 answer

You need two things to detect a deadlock, a list of lock owners and a list of lock waiters, which the standard zk lock recipe gives you if you write some type of node id to the znodes you create.

You do not need to see every change in zookeeper to detect deadlocks. A dead end is not what appears and disappears quickly. By definition, a dead end will stick until you do something. Therefore, if you write code so that your clients watch every node lock they are interested in, the client will eventually see the owners and waiters for each lock, and the client will see a dead end.

However, you need to be careful. The client may not see the updates in order, as updates may occur when the client re-registers the watch. Therefore, if the client detects a deadlock, the client must double check that the deadlock is real by rereading the owner / observers of the locks involved in the deadlock, and make sure that the deadlock is real.

+1
source

Source: https://habr.com/ru/post/914065/


All Articles