ConnectionLoss for / hbase + connection reset by peer?

Question

ConnectionLoss for / hbase + connection reset by peer?

I run the Hadoop MapReduce job on my local computer (pseudo-distributed), which reads and writes to HBase. I periodically receive an error message that disrupts the work, even when the computer is left alone without any other significant processes - see the log below. The exit from ZooKeeper Dump after the work has died is as follows: the number of customers grows after an unsuccessful run:

HBase is rooted at /hbase Master address: SS-WS-M102:60000 Region server holding ROOT: SS-WS-M102:60020 Region servers: SS-WS-M102:60020 Quorum Server Statistics: ss-ws-m102:2181 Zookeeper version: 3.3.3-cdh3u0--1, built on 03/26/2011 00:20 GMT Clients: /192.168.40.120:58484[1]\(queued=0,recved=39199,sent=39203) /192.168.40.120:37129[1]\(queued=0,recved=162,sent=162) /192.168.40.120:58485[1]\(queued=0,recved=39282,sent=39316) /192.168.40.120:58488[1]\(queued=0,recved=39224,sent=39226) /192.168.40.120:58030[0]\(queued=0,recved=1,sent=0) /192.168.40.120:58486[1]\(queued=0,recved=39248,sent=39267)

My development team is currently using the CDH3U0 distribution, so HBase 0.90.1 - is this resolved in a later version? Or should there be something I can do with the current setup? Should I just wait for the ZK to restart and periodically kill the clients? I am open to any reasonable options that will allow my work to be done sequentially.

 2012-06-27 13:01:07,289 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server SS-WS-M102/192.168.40.120:2181 2012-06-27 13:01:07,289 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to SS-WS-M102/192.168.40.120:2181, initiating session 2012-06-27 13:01:07,290 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server SS-WS-M102/192.168.40.120:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:169) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) [lines above repeat 6 more times] 2012-06-27 13:01:17,890 ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormat: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:991) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:302) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:293) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156) at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167) at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:605) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:147) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:989) ... 15 more Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133) ... 16 more

+6

hbase apache-zookeeper

Cyranix Jun 27 '12 at 10:02

source share

3 answers

Cyranix · Answer 1 · 2012-06-29T15:50:58+0000

It turns out that I fell into the low default limit of ZooKeeper (which I believe was increased in later versions). I tried to set a higher limit in hbase-site.xml:

 <property> <name>hbase.zookeeper.property.maxClientCnxns</name> <value>35</value> </property>

But this did not work if it was not (also?) Specified in zoo.cfg:

 # can put this number much higher if desired maxClientCnxns=35

The work can work for several hours, and my ZK customer list reaches 12 records.

Tucker · Answer 2 · 2012-06-28T21:35:27+0000

I had problems like this in the past. A lot of time with HBase / Hadoop you will see error messages that do not indicate the true problem you are facing, so you need to be creative with it.

This is what I found, and it may or may not be applicable to you:

Do you open many connections to the table, and do you close them at the end? This can happen in the MR job if you are scanning / receiving in Mapper or Reducer (which I don’t think you want to do if it can be avoided).

In addition, sometimes similar problems occur if my Mapper or Reducer writes to the same Row LOT. Try to distribute your notes or minimize the notes to the problem of reduction.

It will also help if you talk in detail about the nature of your work on MR. What does it do? Do you have sample code?

Infinity · Answer 3 · 2013-10-03T11:49:44+0000

Check the following options:

zookeeper session timeout (zookeeper.session.timeout) -> try increasing and checking

time to contact the client (tickTime) → increase and testing

check for ulimit (check linux command for the user under which you use hasoop / hbase) specificat

in the case of ulimit, you must have the following parameter somewhat up to a higher value.

open files do this a few 32K or more

Maximum user processes make this unlimited.

after making these changes, most likely the error will disappear

ConnectionLoss for / hbase + connection reset by peer?

More articles: