Hadoop only starts local work by default, why?

Question

Hadoop only starts local work by default, why?

I wrote my own hadoop program, and I can work using the pseudo-distribution mode on my own laptop, however, when I put the program in a cluster that can run the hasoop jar example, it starts the local task by default, although I specify the path to the hdfs file , below - conclusion, give suggestions?

./hadoop -jar MyRandomForest_oob_distance.jar hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1 12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001 12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100 12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720 12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680 12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001 java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356) at Data.Data.loadData(Data.java:103) at MapReduce.DearMapper.loadData(DearMapper.java:261) at MapReduce.DearMapper.setup(DearMapper.java:332) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 12/03/16 16:21:26 INFO mapred.JobClient: map 0% reduce 0% 12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001 12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0 Total Running time is: 1 secs

+6

jobs local

user974270 Mar 16 '12 at 16:28

source share

3 answers

If you are using Hadoop 2 and your work is done locally and not in a cluster, make sure mapred mapred-site.xml to contain the mapreduce.framework.name property with a value of yarn . You also need to configure the aux service in yarn-site.xml

Check out the Cloudera Hadoop 2 Operator Migration Block for more information.

+3

Graham lea Nov 10 '13 at 4:44

source share

I had the same problem that every k2 (mrv2) conversion or yarn was done only with mapred.LocalJobRunner

 INFO mapred.LocalJobRunner: Starting task: attempt_local284299729_0001_m_000000_0

Resourcemanager and Nodemanagers are available, and mapreduce.framework.name is yarn.

Setting HADOOP_MAPRED_HOME before completing the task fixed the problem for me.

 export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

amuses dan

+2

thesonix Mar 14 '13 at 12:51

source share

Chris white · Accepted Answer · 2012-03-22T01:57:43+0000

LocalJobRunner was chosen as your configuration, most likely it has the mapred.job.tracker property set to local , or it was not set at all (in this case, the default value is local). To check, go to "wherever you extracted / installed hadoop" / etc / hadoop / and see if the mapred-site.xml file exists (for me it wasn’t, there was a file called mapped-site.xml.template ) In this file (or create it if it does not exist) make sure that it has the following property:

 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

See source org.apache.hadoop.mapred.JobClient.init(JobConf)

What is the meaning of this configuration property in the hadoop configuration on the machine with which you are sending it? Also confirm that the hasoop executable that you are working on refers to this configuration (and that you do not have 2+ settings configured differently) - enter which hadoop and trace any symbolic links you come across.

Alternatively, you can override this when submitting a job if you know the JobTracker host number and port number using the -jt option:

 hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1

Hadoop only starts local work by default, why?

More articles: