How to configure memory for small work in pyspark?

I get OOM with a pretty small job when working pysparkon YARN.

The quest gets behind the spark and yarn, but shortly afterwards with OOM Exit status: 52:

ERROR cluster.YarnScheduler: Lost executor 1 on <ip>: Container marked as failed: containerID 2 on host: <ip>. Exit status: 52. Diagnostics: Exception from container-launch.

And when I check the yarn log files for this application, I see the following:

    date 19:03:19 INFO hadoop.ColumnChunkPageWriteStore: written 10,852B for [user] BINARY: 11,052 values, 11,764B raw, 10,786B comp, 1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY, RLE], dic { 3,607 entries, 78,442B raw, 3,607B comp}
date 19:03:20 INFO mapred.SparkHadoopMapRedUtil: attempt_date1903_0006_m_000013_0: Committed
date 19:03:20 INFO executor.Executor: Finished task 13.0 in stage 6.0 (TID 1058). 2077 bytes result sent to driver
date time:07 INFO executor.Executor: Executor is trying to kill task 70.0 in stage 6.0 (TID 1115)
date time:07 INFO executor.Executor: Executor is trying to kill task 127.0 in stage 6.0 (TID 1170)
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/daemon.py", line 157, in manager
    code = worker(sock)
  File "/usr/lib/spark/python/pyspark/daemon.py", line 61, in worker
    worker_main(infile, outfile)
  File "/usr/lib/spark/python/pyspark/worker.py", line 136, in main
    if read_int(infile) == SpecialLengths.END_OF_STREAM:
  File "/usr/lib/spark/python/pyspark/serializers.py", line 545, in read_int
    raise EOFError
EOFError
date time:07 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown

I'm not sure what causes this problem, as it seems to have problems with files, but the spark driver has 8 GB of memory.

Cars

look like this:

21 nodes, 64GB each, 8 cores each

spark-defaults.conf:

spark.executor.memory=30928mb
spark.driver.memory 8g
spark.executor.instances=30
spark.executor.cores = 7
spark.yarn.executor.memoryOverhead = 19647

yarn-site.xml:

yarn.scheduler.maximum-allocation-vcores = 1024
yarn.scheduler.maximum-allocation-mb = 61430
yarn.nodemanager.resource.memory-mb = 61430
yarn.nodemanager.resource.cpu-vcores = 7 (left one for the driver)

Am I not setting anything here?

In addition, the AI ​​for the spark shows:

URL: spark://ip:7077
REST URL: spark://ip:6066
Alive Workers: 30
Cores in use: 240 Total, 0 Used
Memory in use: 1769.7 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

Any reason the container will need more memory? I'm not sure to increase or decrease memory?

+4

Source: https://habr.com/ru/post/1675207/


All Articles