Permanent physical memory for the Spark app in YARN

I am running a Spark application on YARN with two artists with Xms / Xmx at 32 and nbsp; GB and spark.yarn.excutor.memoryOverhead as 6 GB.

I see that the physical memory of the application is constantly growing and finally being killed by the node manager:

2015-07-25 15:07:05,354 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=10508,containerID=container_1437828324746_0002_01_000003] is running beyond physical memory limits. Current usage: 38.0 GB of 38 GB physical memory used; 39.5 GB of 152 GB virtual memory used. Killing container.
Dump of the process-tree for container_1437828324746_0002_01_000003 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 10508 9563 10508 10508 (bash) 0 0 9433088 314 /bin/bash -c /usr/java/default/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms32768m -Xmx32768m  -Dlog4j.configuration=log4j-executor.properties -XX:MetaspaceSize=512m -XX:+UseG1GC -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log -XX:AdaptiveSizePolicyOutputInterval=1  -XX:+UseGCLogFileRotation -XX:GCLogFileSize=500M -XX:NumberOfGCLogFiles=1 -XX:MaxDirectMemorySize=3500M -XX:NewRatio=3 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=36082 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -XX:NativeMemoryTracking=detail -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=512m -XX:CompressedClassSpaceSize=256m -Djava.io.tmpdir=/data/yarn/datanode/nm-local-dir/usercache/admin/appcache/application_1437828324746_0002/container_1437828324746_0002_01_000003/tmp '-Dspark.driver.port=43354' -Dspark.yarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkDriver@nn1:43354/user/CoarseGrainedScheduler 1 dn3 6 application_1437828324746_0002 1> /opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003/stdout 2> /opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003/stderr

I turned off the YARN parameter "yarn.nodemanager.pmem-check-enabled" and noticed that physical memory usage lasted up to £ 40.

I checked the general RSS at /proc/pid/smaps, and it was the same value as the physical memory reported by the yarn and seen in the top command.

I checked that this is not a heap problem, but something is increasing from the heap / own memory. I used tools like Visual VM, but I didn’t find anything to grow there. MaxDirectMmeory also did not exceed 600 MB. The maximum number of active threads was 70-80, and the size of the thread stack did not exceed 100 MB. MetaspaceSize was about 60-70 MB.

FYI, I'm on Spark 1.2 and Hadoop 2.4.0, and my Spark application is based on Spark SQL, and it is a high-definition HDFS read / write application that caches data in cache in Spark SQL in memory.

Where should I look for debugging a memory leak or is there a tool already there?

+4
source share
1 answer

, . , , Spark SQL, , , , ( ) .

Parquet Jira PR : -

https://issues.apache.org/jira/browse/PARQUET-353

.

P.S. - .

+1

Source: https://habr.com/ru/post/1599817/


All Articles