Spark shutdown - how to free up shared resources

The Spark manual recommends using a shared static resource (such as a connection pool) within the working code.

Example from the manual :

dstream.foreachRDD { rdd => rdd.foreachPartition { partitionOfRecords => // ConnectionPool is a static, lazily initialized pool of connections val connection = ConnectionPool.getConnection() partitionOfRecords.foreach(record => connection.send(record)) ConnectionPool.returnConnection(connection) // return to the pool for future reuse } } 

What to do if a static resource should be freed / closed before the executor disconnects? There is no place where you can call the close() function. Tried a stop hook, but doesn't seem to help.

Actually, my workflow becomes zombie because I use a shared resource that creates a pool of non-deamon threads (HBase async client), which means the JVM hangs indefinitely.

I am using the correct Spark Streaming shutdown caused by the driver:

 streamingContext.stop(true, true); 

EDIT:

Spark JIRA seems to already have a problem with the same problem

https://issues.apache.org/jira/browse/SPARK-10911

+5
source share

Source: https://habr.com/ru/post/1234626/


All Articles