UnknownHostException in tasktracker in a Hadoop cluster

Question

UnknownHostException in tasktracker in a Hadoop cluster

I installed a Hadoop pseudo-distributed cluster (with jobtracker, tasktracker and namenode all in one block) for instructions and it works fine. Now I am trying to add a second node to this cluster as another tasktracker.

When I look at the logs on node 2, all the logs look fine except for tasktracker. I get an endless loop of the error message below. It seems that Task Tracker is trying to use the hostname SSP-SANDBOX-1.mysite.com, not the ip address. This hostname is not in / etc / hosts, so I assume this is a problem. I do not have root access to add it to / etc / hosts.

Is there any property or configuration that I can change so that it stops trying to connect using the host name?

Many thanks,

2011-01-18 17:43:22,896 ERROR org.apache.hadoop.mapred.TaskTracker: 
Caught exception: java.net.UnknownHostException: unknown host: SSP-SANDBOX-1.mysite.com
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
        at org.apache.hadoop.ipc.Client.call(Client.java:720)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1033)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)

+3

linux hadoop

knt Jan 18 '11 at 23:14

source share

1 answer

bajafresh4life · Accepted Answer · 2011-01-19T14:25:56+0000

This blog post may be helpful:

http://western-skies.blogspot.com/2010/11/fix-for-exceeded-maxfaileduniquefetches.html

The short answer is that Hadoop does a reverse lookup of the hostname, even if you specify IP addresses in the configuration files. In your environment, for Hadoop to work, SSP-SANDBOX-1.mysite.com must resolve the IP address of this machine, and reverse lookup of this IP address must be allowed for SSP-SANDBOX-1.mysite.com.

, , , hosts, DNS-, .

UnknownHostException in tasktracker in a Hadoop cluster

More articles: