Where did you get this warning? What specific magazines? You are lucky that you even get a warning :). Indeed, 17g seems enough, but then you have 1TB of data. I had to use more than 30 g to get less data.
The cause of the error is that the yarn uses additional memory for a container that does not live in the artist's memory space. I noticed that more tasks (partitions) mean much more memory used, and shuffling is usually harder, except that I have not seen any other correspondence with what I am doing. Something somehow eats memory unnecessarily.
It seems the world is moving to Mesos, perhaps it does not have this problem. Even better, just use Spark yourself.
Additional information: http://www.wdong.org/wordpress/blog/2015/01/08/spark-on-yarn-where-have-all-my-memory-gone/ . This link seems dead (it's a deep dive into the way YARN swallows memory). This link may work: http://m.blog.csdn.net/article/details?id=50387104 . If you donβt try googling "spark on yarn, where all my memory is gone"
source share