Unusual Hadoop bug - tasks kill on their own

When I run my hadoop application, I get the following error:

The request received to destroy the task "attempt_201202230353_23186_r_000004_0" by the user The task was KILLED_UNCLEAN by the user

Magazines look clean. I start 28 gearboxes and this does not happen for all gearboxes. This happens for a few favorites, and the gearbox starts up again. I do not understand this. In addition, I noticed that for a small data set, I rarely see this error!

+6
source share
2 answers

Three things can be tried:

Counter setting
If Hadoop sees a counter for completing a task, it will not kill it (see Arockiaraj Durairaj answer.) This seems most elegant as it may allow you to understand longer work tasks and could be freezes.

Long task timeouts
Hadoop sets the timeout after 10 minutes. Changing the timeout is somewhat rude, but may work. Imagine analyzing audio files, which are usually 5MB (songs) files, but you have several 50 MB files (the entire album). Hadoop stores a separate file per block. Therefore, if your HDFS block size is 64 MB, then a 5 MB file and a 50 MB file will require 1 block (64 MB) (see here http://blog.cloudera.com/blog/2009/02/the-small- files-problem / , and here are Small files and HDFS blocks .) However, a 5 MB job will run faster than 50 MB. The timeout task can be increased in the code (mapred.task.timeout) to ask answers to this similar question: How to fix "The task try_201104251139_0295_r_000006_0 did not report the status for 600 seconds."

Increase task objectives . Configure Hadoop to complete more than 4 default attempts (see Pradeep Gollakota). This is the roughest method of the method of the three. Hadoop will try to do more work, but you can mask the main problem (small servers, large data blocks, etc.).

+4
source

Can you try using a counter (hadoop counter) in your reduction logic? It seems like hadoop cannot determine if your shrinking program is running or hanging. He waits a few minutes and kills him, although your logic can still be executed.

+1
source

Source: https://habr.com/ru/post/909677/


All Articles