Fix 2.1.0 session configuration settings (pyspark)

I am trying to overwrite the default configuration for the spark session / spark context, but it selects the entire node / cluster resource.

spark = SparkSession.builder .master("ip") .enableHiveSupport() .getOrCreate() spark.conf.set("spark.executor.memory", '8g') spark.conf.set('spark.executor.cores', '3') spark.conf.set('spark.cores.max', '3') spark.conf.set("spark.driver.memory",'8g') sc = spark.sparkContext 

It works fine when I set the configuration in submit submit

 spark-submit --master ip --executor-cores=3 --diver 10G code.py 
+26
source share
4 answers

In fact, you are not rewriting anything with this code. Only in this way can you see for yourself the following.

Once you run the pyspark shell type:

 sc.getConf().getAll() 

This will show you all the current configuration settings. Then try your code and do it again. Nothing changes.

Instead, you should create a new configuration and use it to create a SparkContext. Do it like this:

 conf = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '3'), ('spark.driver.memory','8g')]) sc.stop() sc = pyspark.SparkContext(conf=conf) 

Then you can test yourself as shown above:

 sc.getConf().getAll() 

This should reflect the desired configuration.

+29
source

update configuration in Spark 2.3.1

To change the default spark settings, you can do the following:

Import the required classes

 from pyspark.conf import SparkConf from pyspark.sql import SparkSession 

Get default settings

 spark.sparkContext._conf.getAll() 

Update default configurations

 conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')]) 

Stop current Spark session

 spark.sparkContext.stop() 

Create Spark Session

 spark = SparkSession.builder.config(conf=conf).getOrCreate() 
+22
source

Setting "spark.driver.host" to "localhost" in the config works for me

 spark = SparkSession \ .builder \ .appName("MyApp") \ .config("spark.driver.host", "localhost") \ .getOrCreate() 
+1
source

You can also configure it when you start pyspark, like spark-submit:

 pyspark --conf property=value 

Here is one example.

 -bash-4.2$ pyspark Python 3.6.8 (default, Apr 25 2019, 21:02:35) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ '/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.2.0 /_/ Using Python version 3.6.8 (default, Apr 25 2019 21:02:35) SparkSession available as 'spark'. >>> spark.conf.get('spark.eventLog.enabled') 'true' >>> exit() -bash-4.2$ pyspark --conf spark.eventLog.enabled=false Python 3.6.8 (default, Apr 25 2019, 21:02:35) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ '/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.2.0 /_/ Using Python version 3.6.8 (default, Apr 25 2019 21:02:35) SparkSession available as 'spark'. >>> spark.conf.get('spark.eventLog.enabled') 'false' 

0
source

Source: https://habr.com/ru/post/1263415/


All Articles