The Spark manual recommends using a shared static resource (such as a connection pool) within the working code.
Example from the manual :
dstream.foreachRDD { rdd => rdd.foreachPartition { partitionOfRecords => // ConnectionPool is a static, lazily initialized pool of connections val connection = ConnectionPool.getConnection() partitionOfRecords.foreach(record => connection.send(record)) ConnectionPool.returnConnection(connection) // return to the pool for future reuse } }
What to do if a static resource should be freed / closed before the executor disconnects? There is no place where you can call the close() function. Tried a stop hook, but doesn't seem to help.
Actually, my workflow becomes zombie because I use a shared resource that creates a pool of non-deamon threads (HBase async client), which means the JVM hangs indefinitely.
I am using the correct Spark Streaming shutdown caused by the driver:
streamingContext.stop(true, true);
EDIT:
Spark JIRA seems to already have a problem with the same problem
https://issues.apache.org/jira/browse/SPARK-10911
source share