I have an application that organizes the execution of batch tasks, and I want to create a SparkSession
for each task, especially in order to get a clear separation of the registered tempo, functions, etc.
Thus, this will lead to thousands of SparkSessions per day, which will live only for the duration of the work (from several minutes to several hours). Is there an argument not to do this?
I know that there is only one SparkContext
on the JVM. I also know that SparkContext
does some global JVM caching, but what exactly does this mean for this scenario? What, for example, is cached in SparkContext
and what happens if a lot of spark tasks are completed in these sessions?
source share