Hadoop - Help needed to understand processing steps

Question

Hadoop - Help needed to understand processing steps

I have a compressed file and it contains 8 xml files of size 5-10kb. I took this data for testing purposes. I wrote one map to unzip the compressed file. I wrote program in MR2 and using Hadoop 2.7.1 in psuedo distributed mode . I start the cluster using the sbin/start-dfs.sh . I can see the unpacked output on the file system in a few seconds, but processing continues for the next 5-6 minutes. I do not know why?

The MR program does not compress files until this stage, and I can view / download these files.

It is impossible to understand what processing my mapreduce program does. i am using MR2 API in my code and why it is using MR1 API(mapred) here? The situation gets worse when I have 128 MB of archived files, and it is uncompressed after 5-10 minutes, and the rest of the time it is busy with some other tasks.

The performance I get is unacceptable and I need to understand what hasoop handles in the second screenshot.

Please help me figure out an installation problem, a problem with my program, or any other problem?

0

mapreduce hadoop

Ajay 25 sept. '15 at 17:56

source share

1 answer

Ajay · Accepted Answer · 2015-09-27T06:35:27+0000

This is a configuration problem, and I solve it with a change in the mapred-site.xml .

 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

Hadoop - Help needed to understand processing steps

More articles: