A deal with ZooKeeper's EndOfStreamException when trying to retry the Curator?

I use the Apache Curator framework to create a ZK managed cluster.

When a node in the cluster suddenly loses connection with ZK, it tries to connect to it every 5 seconds. I am using a RetryForever policy with the specified time for this.

While sessionTimeout / connectionTimeouts not sessionTimeout completely, we are still trying to reconnect.

But even if we raised ZK and established a connection with it during this time, we still get strange messages in the logs:

 Thu Nov 30 20:47:51.574 GMT 2017| INFO | org.apache.zookeeper.ClientCnxn$SendThread | Socket connection established to zk_1.default/138.122.177.23:2181, initiating session |Client Details{sessionTag:{}}| localhost-startStop-1-SendThread(zk_1.default:2181) Thu Nov 30 20:47:51.592 GMT 2017| INFO | org.apache.zookeeper.ClientCnxn$SendThread | Unable to read additional data from server sessionid 0x1600ea13dcd0000, likely server has closed socket, closing socket connection and attempting reconnect |Client Details{sessionTag:{}}| localhost-startStop-1-SendThread(zk_1.default:2181) 

Why are we still receiving these messages and cannot fully connect to the recently raised ZK host?

  1. A little later, I discovered that this error means that ZK has exhausted its maxClientCnxns parameter (the maximum number of connections to ZK), but I did not find how to configure it using a curator ... Only in tests ...

  2. On the server side of ZK, I see the following errors:

     2017-12-04 15:48:29,972 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181: NIOServerCnxnFactory@192 ] - Accepted socket connection from /192.168.107.4:37130 2017-12-04 15:48:29,974 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181: ZooKeeperServer@915 ] - Refusing session request for client /138.122.177.23:37130 as it has seen zxid 0xd our last zxid is 0x0 client must try another server 
+5
source share
1 answer

The answer is here: http://grokbase.com/t/cloudera/cdh-user/134hrpr3nc/zookeeperserver-refusing-session-request-for-client-any-ideas

This means that the client was talking to the ZK server and the transaction of the last seen ID is 0x11be4 (or 72676 in decimal form). The current server on which it is trying to connect currently has a transaction identifier of 0x3f82 (or 16258 in decimal form), which is (much) lower. It is assumed that the client saw more recent data than what the server can provide, which could lead to information conflict. Instead of giving the client bad (outdated) information, it refuses the connection with the assumption that the client will reconnect to another server in the quorum with more modern data or that the outdated server will eventually receive a snapshot from another quorum member to catch up with it.

+2
source

Source: https://habr.com/ru/post/1273780/


All Articles