Hadoop does not show my work in the job tracker, even if it works

Problem: When I submit a task to my hadoop 2.2.0 cluster, it does not appear in the work tracker , but the task completes successfully. . can see the result, and it works correctly and prints the output as it starts.

I tried multiple-choice options, but the job tracker doesn't see the job. If I start a streaming task using 2.2.0 hadoop, it appears in the task tracker, but when I submit it via apoop-client api, it does not appear in the work tracker. I am looking at the ui interface on port 8088 to check if it works.

Environment OSX Mavericks, Java 1.6, Hadoop 2.2.0 single cluster node, Tomcat 7.0.47

code

try { configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000"); configuration.set("mapred.jobtracker.address", "localhost:9001"); Job job = createJob(configuration); job.waitForCompletion(true); } catch (Exception e) { logger.log(Level.SEVERE, "Unable to execute job", e); } return null; 

etc. / Hadoop / mapred -site.xml

 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration> 

etc. / Hadoop / core-site.xml

 <configuration> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-${user.name}</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration> 
+5
source share
2 answers

The solution to the problem was to set up the task with additional configuration parameters for the yarn. I made the wrong assumption that the java hadoop-client api will use the configuration parameters from the configuration directory. I was able to diagnose the problem by enabling verbose logging using log4j.properties for my unit tests. He showed that jobs work locally and are not transferred to the yarn resource manager. With a small amount of trial and error, I was able to set up the task and transfer it to the yarn resource manager.

code

  try { configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000"); configuration.set("mapreduce.jobtracker.address", "localhost:54311"); configuration.set("mapreduce.framework.name", "yarn"); configuration.set("yarn.resourcemanager.address", "localhost:8032"); Job job = createJob(configuration); job.waitForCompletion(true); } catch (Exception e) { logger.log(Level.SEVERE, "Unable to execute job", e); } 
+5
source

I see that you are using Hadoop 2.2.0. Are you using MRv1 or MRv2? Demons are different for MRv2 (YARN). There is no JobTracker for MRv2, although you can see the placeholder page for the JobTracker interface.

The ResourceManager web interface should display your defined tasks. The default web URL for ResourceManager is: http: // <ResourcemanagerHost>: 8088

Replace ResourceManagerHost with the IP address of the node where the resource manager is running.

You can learn more about the YARN architecture in Apache Hadoop YARN

+4
source

Source: https://habr.com/ru/post/971624/


All Articles