Hadoop - Help needed to understand processing steps

I have a compressed file and it contains 8 xml files of size 5-10kb. I took this data for testing purposes. I wrote one map to unzip the compressed file. I wrote program in MR2 and using Hadoop 2.7.1 in psuedo distributed mode . I start the cluster using the sbin/start-dfs.sh . I can see the unpacked output on the file system in a few seconds, but processing continues for the next 5-6 minutes. I do not know why?

enter image description here

The MR program does not compress files until this stage, and I can view / download these files.

enter image description here

It is impossible to understand what processing my mapreduce program does. i am using MR2 API in my code and why it is using MR1 API(mapred) here? The situation gets worse when I have 128 MB of archived files, and it is uncompressed after 5-10 minutes, and the rest of the time it is busy with some other tasks.

The performance I get is unacceptable and I need to understand what hasoop handles in the second screenshot.

Please help me figure out an installation problem, a problem with my program, or any other problem?

0
source share
1 answer

This is a configuration problem, and I solve it with a change in the mapred-site.xml .

 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> 
0
source

Source: https://habr.com/ru/post/910930/


All Articles