YarnException: unauthorized request to start a container

I installed hasoop2.2.0 on 3 clusters. Everything goes well. NodeManager and Datanode are launched in each cluster. But, when I run the wordcount example, 100% matching is done, and this gives the following exception:

map 100% reduce 0% 13/11/28 09:57:15 INFO mapreduce.Job: Task Id : attempt_1385611768688_0001_r_000000_0, Status : FAILED Container launch failed for container_1385611768688_0001_01_000003 : org.apache.hadoop.yarn.exceptions. YarnException: Unauthorized request to start container. This token is expired. current time is 1385612996018 found 1385612533275 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 

I went through the Internet to find a solution. But I could not understand. Help me out.

+6
source share
4 answers

This exception occurs when your nodes have different time settings. Make sure that all 3 nodes have the same time and time settings, and then restart the computer.

It worked for me. Hope this help you too !!!!

+6
source

One option would be to increase the life of the container by installing

yarn.resourcemanager.rm.container-allocation.expiry-interval-ms

which is 10 minutes by default

eg.
Service Universal / Advanced
YARN Service Safety Safety Valve for .xml Yarn Site

  <property> <name>yarn.resourcemanager.rm.container-allocation.expiry-interval-ms</name> <value>1000000</value> </property> 
+3
source

In addition to time settings, make sure that the nodes work with NTP or have time synchronization quite well - I had the same problem and I found that one of the nodes had the wrong YEAR in the date. As soon as I put the time in seconds apart, the error disappeared.

+3
source

If you see this error all of a sudden, it could be due to temporary drifts of virtual machines.

All virtual machines may be subject to temporary drift.

System time can drift for several minutes on long clusters if it is not in sync with a known good time source. Thus, all of your cluster nodes, using their own system, can move from time to time from time to time.

Your Hadoop jobs may run successfully because the drift may not be quite noticeable. However, on long clusters, if one working time drifts for too long (compared to the main time) that it exceeds a 10-minute interval, then tasks will not be completed because the YARN containers scheduled for these workers will be marked EXPIRED, as soon as AM serves it.

The key part:

"For any container, if the corresponding NM does not tell RM that the container has started to work within the specified time interval, the default time is 10 minutes, the container is considered dead and RM has expired."

You can learn more about the distribution of the YARN container here: http://hortonworks.com/blog/apache-hadoop-yarn-resourcemanager/

So, jobs will work if you increase yarn.resourcemanager.rm.container-allocation.expiry-interval-ms in the yarn-site.xml configuration file.

But this is only a temporary solution.


To avoid an actual problem, you need to use some kind of synchronization mechanism like NTP.

NTP is responsible for synchronizing time with global time servers and your Master / Work nodes.

You need to make sure that the NTP daemon is up and running on all nodes of the cluster. NTP must also remain β€œsynchronized” ( ntpstat ) throughout the life of the cluster. Some obvious issues that can lead to NTP out of sync

  • Your firewall can block UDP port 123.
  • You may have an AD environment with a different time synchronization contrary to NTP.

http://support.ntp.org/bin/view/Support/TroubleshootingNTP

0
source

Source: https://habr.com/ru/post/958970/


All Articles