Hazelcast broken node detection

Here is my simplified use case:

  • I have two built-in nodes, each of which runs in its own JVM, on the same physical machine. I run them and they form a simple cluster.
  • both nodes try to get the same lock
  • the first, to get a lock, holds it for 30 seconds.
  • If I kill a node that holds the lock, the cluster needs something between 5 and 10 seconds to conclude that the node is dead and release the lock

My question is: can this interval between killing a node that locks the lock and the cluster actually free the lock? I need it to be less than 1 second.

I tried some of the available properties that seemed to be related to this problem:

hazelcast.socket.connect.timeout.seconds
hazelcast.client.heartbeat.timeout
hazelcast.client.invocation.timeout.seconds

None of this helped; I did not notice a change in the behavior of the lock.

Update:

These two seem to be correct:

<property name="hazelcast.socket.connect.timeout.seconds">1</property>
<property name="hazelcast.connection.monitor.max.faults">1</property>

I have yet to find out if this will cause stability problems in a real use scenario. In this simple test, it works quite well.

+4
source share

Source: https://habr.com/ru/post/1629205/


All Articles