A container killed by YARN exceeds memory limits. Used 52.6 GB of physical memory 50 GB. Consider activating spark.yarn.executor.memoryOverhead

Performing a spark task with 1 TB data with the following configuration:

Memory of performers 33G 40 performers 5 cores per artist

17 g memoryoverhead

What are the possible causes of this error?

+5
source share
2 answers

Where did you get this warning? What specific magazines? You are lucky that you even get a warning :). Indeed, 17g seems enough, but then you have 1TB of data. I had to use more than 30 g to get less data.

The cause of the error is that the yarn uses additional memory for a container that does not live in the artist's memory space. I noticed that more tasks (partitions) mean much more memory used, and shuffling is usually harder, except that I have not seen any other correspondence with what I am doing. Something somehow eats memory unnecessarily.

It seems the world is moving to Mesos, perhaps it does not have this problem. Even better, just use Spark yourself.

Additional information: http://www.wdong.org/wordpress/blog/2015/01/08/spark-on-yarn-where-have-all-my-memory-gone/ . This link seems dead (it's a deep dive into the way YARN swallows memory). This link may work: http://m.blog.csdn.net/article/details?id=50387104 . If you don’t try googling "spark on yarn, where all my memory is gone"

+3
source

One of the possible problems is that your virtual memory becomes very large compared to your physical memory. You can set yarn.nodemanager.vmem-check-enabled to false in yarn-site.xml to see what happens. If the error stops, this may be a problem.

I answered a similar question elsewhere and provided additional information: fooobar.com/questions/629701 / ...

+1
source

Source: https://habr.com/ru/post/1232708/


All Articles