I use the Apache Curator framework to create a ZK managed cluster.
When a node in the cluster suddenly loses connection with ZK, it tries to connect to it every 5 seconds. I am using a RetryForever policy with the specified time for this.
While sessionTimeout / connectionTimeouts not sessionTimeout completely, we are still trying to reconnect.
But even if we raised ZK and established a connection with it during this time, we still get strange messages in the logs:
Thu Nov 30 20:47:51.574 GMT 2017| INFO | org.apache.zookeeper.ClientCnxn$SendThread | Socket connection established to zk_1.default/138.122.177.23:2181, initiating session |Client Details{sessionTag:{}}| localhost-startStop-1-SendThread(zk_1.default:2181) Thu Nov 30 20:47:51.592 GMT 2017| INFO | org.apache.zookeeper.ClientCnxn$SendThread | Unable to read additional data from server sessionid 0x1600ea13dcd0000, likely server has closed socket, closing socket connection and attempting reconnect |Client Details{sessionTag:{}}| localhost-startStop-1-SendThread(zk_1.default:2181)
Why are we still receiving these messages and cannot fully connect to the recently raised ZK host?
A little later, I discovered that this error means that ZK has exhausted its maxClientCnxns parameter (the maximum number of connections to ZK), but I did not find how to configure it using a curator ... Only in tests ...
On the server side of ZK, I see the following errors:
2017-12-04 15:48:29,972 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181: NIOServerCnxnFactory@192 ] - Accepted socket connection from /192.168.107.4:37130 2017-12-04 15:48:29,974 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181: ZooKeeperServer@915 ] - Refusing session request for client /138.122.177.23:37130 as it has seen zxid 0xd our last zxid is 0x0 client must try another server
source share