I sent the task to a cluster of 4 hosts, I see that it was correctly distributed between 4 nodes, 1 map task on a node.
Later, one of the node failed.
I stopped tasktracker on the failed node, added the identifier of this node to exclude the file and an updated list of nodes with hasoop mradmin -refreshNodes . The failed node disappeared from the list of available nodes in the administration admin pages.
Then I started tasktracker again, updated the nodes with mradmin and noticed that the node appeared in the work tracking list again.
At runtime of a node, execution of an overridden map job hadoop on another node, so it started to run 2 map jobs. My cluster is unbalanced:
- 2 nodes performed 1 task,
- 1 node performed 2 tasks
- and 1 node (the one I restarted) did not perform any tasks.
I killed the job with hasoop job -kill-task try_201308010141_0001_m_000000_1 , and it looks like it never starts again, so I see three nodes running 1 task, 1 node without any tasks and 1 waiting task in the list.
Am I missing something? What is the correct way to "move" a task from one node to another?
source share