Spark artists damaged by net memory leak

when you start an application for sparking, which consumes data from a kafka theme with 100 partitions, and streaming work is performed with 10 artists, 5 cores and 20 GB of RAM per artist, the artists crash with the following log:

ERROR ResourceLeakDetector: LEAK: ByteBuf.release () was not called before garbage collection. Enable extended leak reporting to find where the leak occurred.

ERROR YarnClusterScheduler: Lost artist 18 on worker23.oct.com: Slave lost

ERROR ApplicationMaster: RECEIVED SIGNAL TIME

this exception appears in spark JIRA:

https://issues.apache.org/jira/browse/SPARK-17380

and someone wrote that after upgrading to spark 2.0.2 the problem was resolved. however, we use spark 2.1 as part of HDP 2.6. therefore, I assume that this error was not resolved in spark 2.1.

there is also someone who encountered this error and wrote about it in the list of spark users, but did not receive an answer:

http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Receiver-Resource-Leak-td27857.html

BTW - the streaming application does not call cache() or persist() , so no caching is involved.

Has anyone come across a streaming application that crashed into such an error?

+5
source share

Source: https://habr.com/ru/post/1272566/


All Articles