Multiple SparkSessions in one JVM

I have a request regarding the creation of multiple spark sessions in one JVM. I read that creating multiple contexts is not recommended in earlier versions of Spark. Is this also true with SparkSession in Spark 2.0.

I am going to make a call to a web service or servlet from the user interface, and the service creates a spark session, performs some operation, and returns the result. This will create a spark session for each request from the client. Is this practice recommended?

Let's say I have a method like:

public void runSpark () throws Exception {

SparkSession spark = SparkSession .builder() .master("spark://<masterURL>") .appName("JavaWordCount") .getOrCreate(); 

etc....

If I put this method in a web service, will there be any problems with the JVM? That way, I can use this method several times from the main method. But not sure if this is a good practice.

+6
source share
4 answers

It is not supported and will not be. SPARK-2243 is allowed as it will not be fixed.

If you need several contexts, there are different projects that can help you (Mist, Livy).

+2
source

The getOrCreate documentation contains

This method first checks to see if a valid thread-local SparkSession exists, and if so, return this. Then it checks if there is a valid global SparkSession error, and if so, return that one. If there is no valid global SparkSession default value, the method creates a new SparkSession and sets the newly created SparkSession as the global default value.

There is also a SparkSession.newSession method that indicates

Start a new session with isolated SQL configurations, temporary tables, registered functions are isolated, but SparkContext and cached data.

So, I think the answer to your question is that you can have multiple sessions, but there is still one SparkContext for the JVM that will be used by all your sessions.

I could suggest that perhaps the scenario for your web application might be to create one SparkSession for each request, or for example. HTTP and use this to isolate Spark execution for every request or user session <- Since I'm pretty new to Spark, can someone confirm this?

+7
source

You can call getOrCreate several times.

This function can be used to get or instantiate a SparkContext and register it as a singleton object. Since we can only have one active SparkContext for the JVM, this is useful when applications may want to share with SparkContext .

getOrCreate creates a SparkContext in the JVM if there is no SparkContext . If SparkContext is already available in the JVM, it does not create a new one, but returns the old one .

+4
source

If you have an existing spark session and want to create a new one, use the newSession method for an existing SparkSession.

 import org.apache.spark.sql.{SQLContext, SparkSession} val newSparkSession = spark.newSession() 

The newSession method creates a new spark session with isolated SQL configurations, temporary tables. The new session will share the core SparkContext and cached data.

+2
source

Source: https://habr.com/ru/post/1011508/


All Articles