Spark: run from single JVM jobs with different memory / core configurations at the same time

Explanation of the problem

Suppose you have a Spark cluster with an autonomous manager, where the task is scheduled through SparkSessioncreated in the client application. The client application runs on the JVM. And you must run each task with different configurations for the sake of productivity, see the example of types of work below.

The problem is, you cannot create two sessions from the same JVM .

So, how are you going to run several Spark jobs with different session configurations at the same time?

In different session configurations, I mean:

  • spark.executor.cores
  • spark.executor.memory
  • spark.kryoserializer.buffer.max
  • spark.scheduler.pool
  • etc.

My thoughts

Possible solutions to the problem:

  • Spark SparkSession. ?
  • JVM, SparkSession, Spark. , . - 2-3 . , .
  • . .
  • Spark Spark. , (, Hazelcast) Spark . , : , ..

  • . , IO - . , .
  • . , .
  • , .
  • - 1-2 3, .
  • .
0
1

Spark standalone FIFO . . , . , , .., SparkConf.

Apache Mesos . ( Apache Mesos), , . , . Apache Mesos , , , . Apache Mesos - , Spark , , . , , .

Apache Hadoop YARN ResourceManager , ApplicationManager. . : CapacityScheduler, , , FairScheduler, , , . , , . . ApplicationManager ApplicationMaster. ApplicationMaster - Spark. Spark SparkConf .

, - ,

+1

Source: https://habr.com/ru/post/1015838/


All Articles