Spark always uses an existing SparkContext every time I start work using spark-submit

I deployed a can of sparks on the cluster. I pass on the spark mission using the team spark-submitfollowing my project flask.

I have a lot Spak Confin my project. Conf will be decided based on which class I run, but every time I run a spark, I get this warning.

7/01/09 07:32:51 WARN SparkContext: use the existing SparkContext, some configurations may not take effect.

Request Does this mean that SparkContext already exists, and my job selects this. Request Why configuration does not occur

The code

private val conf = new SparkConf()
    .setAppName("ELSSIE_Ingest_Cassandra")
    .setMaster(sparkIp)
    .set("spark.sql.shuffle.partitions", "8")
    .set("spark.cassandra.connection.host", cassandraIp)
    .set("spark.sql.crossJoin.enabled", "true")


object SparkJob extends Enumeration {

  val Program1, Program2, Program3, Program4, Program5 = Value
}

object ElssieCoreContext {

def getSparkSession(sparkJob: SparkJob.Value = SparkJob.RnfIngest): SparkSession = {
     val sparkSession = sparkJob match {
          case SparkJob.Program1 => {
            val updatedConf = conf.set("spark.cassandra.output.batch.size.bytes", "2048").set("spark.sql.broadcastTimeout", "2000")
            SparkSession.builder().config(updatedConf).getOrCreate()
          }
          case SparkJob.Program2 => {
            val updatedConf = conf.set("spark.sql.broadcastTimeout", "2000")
            SparkSession.builder().config(updatedConf).getOrCreate()
          }
    }

}

And in Program1.scala I call

val spark = ElssieCoreContext.getSparkSession()
val sc = spark.sparkContext
+4

Source: https://habr.com/ru/post/1666220/


All Articles