Hadoop Single Node installation on Windows 7

Question

Hadoop Single Node installation on Windows 7

I am new to hadoop and trying to get one install of Hadoop 0.20.2 on my Windows 7 machine.

My questions are doubled - one regarding the completeness of the installation itself, and the other - errors in the stage of reducing the sample program Word Count.

My installation steps are as follows:

I follow http://blog.benhall.me.uk/2011/01/installing-hadoop-0210-on-windows.html for the installation procedure.

I installed cygwin and configured ssh without password on my localhost My java version:

java version "1.7.0_02" Java(TM) SE Runtime Environment (build 1.7.0_02-b13) Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode)

The content of conf / core-site.xml:

 <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>

The content of conf / hdfs-site.xml:

 <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

The content of conf / mapred-site.xml:

 <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>

I set the JAVA_HOME variable and the hadoop version command prints 0.20.2 hasoop namenode -format creates DFS without errors

start-all.sh prints that namenode, secondarynamenode, datanode, jobtracker and tasktracker are all started.

however, the jps command prints:

 $ jps 4584 Jps 11008 JobTracker 2084 NameNode

I noticed that jps printed pids' tasktracker, secondarynamenode as well.

I can view the output

  http: // localhost: 50030 for the jobtracker, 
 http: // localhost: 50060 for the tasktracker and 
 http: // localhost: 50070 for the namenode.

I tried both to put and receive commands in hdfs, and they were successful:

 bin/hadoop fs -mkdir In bin/hadoop fs -put *.txt In mkdir temp bin/hadoop fs -get In temp ls -l temp/In $ ls -l temp/In/ total 365 348624 Mar 24 23:59 CHANGES.txt 13366 Mar 24 23:59 LICENSE.txt 101 Mar 24 23:59 NOTICE.txt 1366 Mar 24 23:59 README.txt

I could also browse these files by browsing DFS via the http interface for namenode

Is my installation complete?
If so, why doesn't the jps command display the pids of all five components?
If not, what steps do I need to complete the installation?
What are the other health checks used to verify that the installation is complete?

Initially, I believed that my installation would be completed and run the sample program to scale WordCount in accordance with http://jayant7k.blogspot.com/2010/06/writing-your-first-map-reduce-program.html

I get the following output:

 12/03/25 00:10:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/03/25 00:10:26 INFO input.FileInputFormat: Total input paths to process : 1 12/03/25 00:10:27 INFO mapred.JobClient: Running job: job_201203242348_0001 12/03/25 00:10:28 INFO mapred.JobClient: map 0% reduce 0% 12/03/25 00:10:35 INFO mapred.JobClient: map 100% reduce 0% 12/03/25 00:21:29 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0 00000_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 12/03/25 00:32:25 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0 00000_1, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 12/03/25 00:44:02 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0 00000_2, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 12/03/25 00:55:00 INFO mapred.JobClient: Job complete: job_201203242348_0001 12/03/25 00:55:00 INFO mapred.JobClient: Counters: 12 12/03/25 00:55:00 INFO mapred.JobClient: Job Counters 12/03/25 00:55:00 INFO mapred.JobClient: Launched reduce tasks=4 12/03/25 00:55:00 INFO mapred.JobClient: Launched map tasks=1 12/03/25 00:55:00 INFO mapred.JobClient: Data-local map tasks=1 12/03/25 00:55:00 INFO mapred.JobClient: Failed reduce tasks=1 12/03/25 00:55:00 INFO mapred.JobClient: FileSystemCounters 12/03/25 00:55:00 INFO mapred.JobClient: HDFS_BYTES_READ=13366 12/03/25 00:55:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=23511 12/03/25 00:55:00 INFO mapred.JobClient: Map-Reduce Framework 12/03/25 00:55:00 INFO mapred.JobClient: Combine output records=0 12/03/25 00:55:00 INFO mapred.JobClient: Map input records=244 12/03/25 00:55:00 INFO mapred.JobClient: Spilled Records=1887 12/03/25 00:55:00 INFO mapred.JobClient: Map output bytes=19699 12/03/25 00:55:00 INFO mapred.JobClient: Combine input records=0 12/03/25 00:55:00 INFO mapred.JobClient: Map output records=1887

The map task seems complete, but the reduction task shows the following error in the logs:

 2012-03-25 00:10:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0: Got 1 new map-outputs 2012-03-25 00:10:40,193 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201203242348_0001_m_000000_0, compressed len: 23479, decompressed len: 23475 2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 23475 bytes (23479 raw bytes) into RAM from attempt_201203242348_0001_m_000000_0 2012-03-25 00:11:35,194 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress 2012-03-25 00:11:35,194 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2012-03-25 00:12:35,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress 2012-03-25 00:12:35,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2012-03-25 00:13:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress 2012-03-25 00:13:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2012-03-25 00:13:40,249 INFO org.apache.hadoop.mapred.ReduceTask: Failed to shuffle from attempt_201203242348_0001_m_000000_0 java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:239) at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:680) at java.io.FilterInputStream.read(FilterInputStream.java:133) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2959) at org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149) at org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1522) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)

The following are the contents of the task tracking logs:

 2012-03-25 00:10:27,910 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_m_000002_0 task state:UNASSIGNED 2012-03-25 00:10:27,915 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_m_000002_0 2012-03-25 00:10:27,915 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_m_000002_0 2012-03-25 00:10:28,453 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_m_625085452 2012-03-25 00:10:28,454 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_m_625085452 spawned. 2012-03-25 00:10:29,217 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_m_625085452 given task: attempt_201203242348_0001_m_000002_0 2012-03-25 00:10:29,523 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_m_000002_0 0.0% setup 2012-03-25 00:10:29,524 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201203242348_0001_m_000002_0 is done. 2012-03-25 00:10:29,524 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201203242348_0001_m_000002_0 was 0 2012-03-25 00:10:29,526 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2012-03-25 00:10:29,718 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201203242348_0001_m_625085452 exited. Number of tasks it ran: 1 2012-03-25 00:10:30,911 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201203242348_0001/attempt_201203242348_0001_m_000002_0/output/file.out in any of the configured local directories 2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_m_000000_0 task state:UNASSIGNED 2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_m_000000_0 2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_m_000000_0 2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201203242348_0001_m_000002_0 2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201203242348_0001_m_000002_0 2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201203242348_0001_m_000002_0 done; removing files. 2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201203242348_0001_m_000002_0 not found in cache 2012-03-25 00:10:31,077 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_m_-1399302881 2012-03-25 00:10:31,077 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_m_-1399302881 spawned. 2012-03-25 00:10:31,812 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_m_-1399302881 given task: attempt_201203242348_0001_m_000000_0 2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_m_000000_0 1.0% 2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201203242348_0001_m_000000_0 is done. 2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201203242348_0001_m_000000_0 was 0 2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2012-03-25 00:10:32,822 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201203242348_0001_m_-1399302881 exited. Number of tasks it ran: 1 2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_r_000000_0 task state:UNASSIGNED 2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_r_000000_0 2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_r_000000_0 2012-03-25 00:10:34,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_r_625085452 2012-03-25 00:10:34,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_r_625085452 spawned. 2012-03-25 00:10:34,852 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_r_625085452 given task: attempt_201203242348_0001_r_000000_0 2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 23479 bytes for reduce: 0 from map: attempt_201203242348_0001_m_000000_0 given 23479/23475 2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.33:50060, dest: 192.168.1.33:60790, bytes: 23479, op: MAPRED_SHUFFLE, cliID: attempt_201203242348_0001_m_000000_0 2012-03-25 00:10:41,153 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy > 2012-03-25 00:10:44,158 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy > 2012-03-25 00:16:05,244 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 23479 bytes for reduce: 0 from map: attempt_201203242348_0001_m_000000_0 given 23479/23475 2012-03-25 00:16:05,244 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.33:50060, dest: 192.168.1.33:60864, bytes: 23479, op: MAPRED_SHUFFLE, cliID: attempt_201203242348_0001_m_000000_0 2012-03-25 00:16:05,249 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy > 2012-03-25 00:16:08,249 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy > 2012-03-25 00:21:25,251 FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201203242348_0001_r_000000_0 - Killed due to Shuffle Failure: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

I opened ports 9000 and 9001 in the Windows firewall. I checked the telnet output to make sure that these ports are really open:

 C:\Windows\system32>netstat -a -n | grep -e "500[367]0" TCP 0.0.0.0:50030 0.0.0.0:0 LISTENING TCP 0.0.0.0:50060 0.0.0.0:0 LISTENING TCP 0.0.0.0:50070 0.0.0.0:0 LISTENING TCP [::]:50030 [::]:0 LISTENING TCP [::]:50060 [::]:0 LISTENING TCP [::]:50070 [::]:0 LISTENING C:\Windows\system32>netstat -a -n | grep -e "900[01]" TCP 127.0.0.1:9000 0.0.0.0:0 LISTENING TCP 127.0.0.1:9000 127.0.0.1:60332 ESTABLISHED TCP 127.0.0.1:9000 127.0.0.1:60987 ESTABLISHED TCP 127.0.0.1:9001 0.0.0.0:0 LISTENING TCP 127.0.0.1:9001 127.0.0.1:60410 ESTABLISHED TCP 127.0.0.1:60332 127.0.0.1:9000 ESTABLISHED TCP 127.0.0.1:60410 127.0.0.1:9001 ESTABLISHED TCP 127.0.0.1:60987 127.0.0.1:9000 ESTABLISHED

Could you help with both installation problems and the task of reduction?

I looked

  http://wiki.apache.org/hadoop/SocketTimeout

and a few other links and tried the suggestions, but without any success.

I appreciate your patience in reading this post and will be happy to provide further details.

Thanks in advance.

+4

windows-7 hadoop

user1290793 Mar 25 '12 at 5:20

source share

1 answer

Tejas patil · Accepted Answer · 2012-03-25T14:34:35+0000

See this line in your logs:

 2012-03-25 00:10:30,911 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201203242348_0001/attempt_201203242348_0001_m_000002_0/output/file.out in any of the configured local directories

I assume you need to check hadoop.tmp.dir and mapred.local.dir . You mentioned the settings you use, and therefore the values of these two parameters are the default values. The default values of these parameters are set here . Install them in the appropriate place and try again.

NOTE. Before you change this, you need to stop hadoop and start after you finish.

Hadoop Single Node installation on Windows 7

More articles: