I have an application that organizes the execution of batch tasks, and I want to create a SparkSession for each task, especially in order to get a clear separation of the registered tempo, functions, etc.
Thus, this will lead to thousands of SparkSessions per day, which will live only for the duration of the work (from several minutes to several hours). Is there an argument not to do this?
I know that there is only one SparkContext on the JVM. I also know that SparkContext does some global JVM caching, but what exactly does this mean for this scenario? What, for example, is cached in SparkContext and what happens if a lot of spark tasks are completed in these sessions?
source share