Properties related to Spark deployment in spark gas

When creating a java application with spark sparking, SparkConf is created using

sparkConf = new SparkConf().setAppName("SparkTests")
                           .setMaster("local[*]").set("spark.executor.memory", "2g")
                           .set("spark.driver.memory", "2g")
                           .set("spark.driver.maxResultSize", "2g");

But the documentation here says that

Any values ​​specified as flags or in the properties file will be transferred to the application and combined with those specified through SparkConf. Properties set directly on SparkConf have the highest priority, then the flags are transferred to the source text spark-submit or spark-shell, and then in the spark-defaults.conf file. Several configuration keys have been renamed from earlier versions of Spark; in such cases, the old key names are still accepted, but have lower priority than any instance of the new key. The properties of a spark can mainly be divided into two types: one is related to the deployment, for example, “spark.driver.memory”, “spark.executor.instances”, such properties cannot be affected during programming using SparkConf at runtime, or the behavior depends from havingwhich cluster manager and deployment mode you choose, so it would be suggested to install the configuration file or command line parameters with the "fix" parameter; the other is mainly related to controlling Spark startup time, for example, "spark.task.maxFailures", this type of property can be set anyway.

, , , spark-submit?

local[*], .

+4
1

, :

SparkConf , , , , ; Spark

. , - . , YARN :

  • SparkSession.builder()
    .config(sparkConf)
    .getOrCreate() 
    

    (command line, defaults.conf). ( session.getOrCreate). ,

  • , ( , defaults.conf)

  • , , defaults.conf

, - , "spark.driver.memory", "spark.executor.instances".

0

Source: https://habr.com/ru/post/1694451/


All Articles