I get OOM with a pretty small job when working pysparkon YARN.
The quest gets behind the spark and yarn, but shortly afterwards with OOM Exit status: 52:
ERROR cluster.YarnScheduler: Lost executor 1 on <ip>: Container marked as failed: containerID 2 on host: <ip>. Exit status: 52. Diagnostics: Exception from container-launch.
And when I check the yarn log files for this application, I see the following:
date 19:03:19 INFO hadoop.ColumnChunkPageWriteStore: written 10,852B for [user] BINARY: 11,052 values, 11,764B raw, 10,786B comp, 1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY, RLE], dic { 3,607 entries, 78,442B raw, 3,607B comp}
date 19:03:20 INFO mapred.SparkHadoopMapRedUtil: attempt_date1903_0006_m_000013_0: Committed
date 19:03:20 INFO executor.Executor: Finished task 13.0 in stage 6.0 (TID 1058). 2077 bytes result sent to driver
date time:07 INFO executor.Executor: Executor is trying to kill task 70.0 in stage 6.0 (TID 1115)
date time:07 INFO executor.Executor: Executor is trying to kill task 127.0 in stage 6.0 (TID 1170)
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/daemon.py", line 157, in manager
code = worker(sock)
File "/usr/lib/spark/python/pyspark/daemon.py", line 61, in worker
worker_main(infile, outfile)
File "/usr/lib/spark/python/pyspark/worker.py", line 136, in main
if read_int(infile) == SpecialLengths.END_OF_STREAM:
File "/usr/lib/spark/python/pyspark/serializers.py", line 545, in read_int
raise EOFError
EOFError
date time:07 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
I'm not sure what causes this problem, as it seems to have problems with files, but the spark driver has 8 GB of memory.
Carslook like this:
21 nodes, 64GB each, 8 cores each
spark-defaults.conf:
spark.executor.memory=30928mb
spark.driver.memory 8g
spark.executor.instances=30
spark.executor.cores = 7
spark.yarn.executor.memoryOverhead = 19647
yarn-site.xml:
yarn.scheduler.maximum-allocation-vcores = 1024
yarn.scheduler.maximum-allocation-mb = 61430
yarn.nodemanager.resource.memory-mb = 61430
yarn.nodemanager.resource.cpu-vcores = 7 (left one for the driver)
Am I not setting anything here?
In addition, the AI for the spark shows:
URL: spark://ip:7077
REST URL: spark://ip:6066
Alive Workers: 30
Cores in use: 240 Total, 0 Used
Memory in use: 1769.7 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
Any reason the container will need more memory? I'm not sure to increase or decrease memory?