How to use a custom configuration file for SparkSession (without using spark-submit to submit an application)?

I have an independent python script that creates SparkSessionby invoking the following lines of code, and I see that it fine-tunes the spark session as indicated in the file spark-defaults.conf.

spark = SparkSession.builder.appName("Tester").enableHiveSupport().getOrCreate()

If I want to pass another file containing the spark configuration that I want to use instead spark-default.conf, how can I specify this when creating SparkSession?

I see that I can pass the object SparkConf, but is there a way to create it automatically from a file containing all the configurations?

Do I have to manually parse the input file and set the appropriate configuration manually?

+4
source share
1 answer

If you are not using spark-submit, it is best to override here SPARK_CONF_DIR. Create a separate directory for each set of configurations:

$ configs tree           
.
├── conf1
│   ├── docker.properties
│   ├── fairscheduler.xml
│   ├── log4j.properties
│   ├── metrics.properties
│   ├── spark-defaults.conf
│   ├── spark-defaults.conf.template
│   └── spark-env.sh
└── conf2
    ├── docker.properties
    ├── fairscheduler.xml
    ├── log4j.properties
    ├── metrics.properties
    ├── spark-defaults.conf
    ├── spark-defaults.conf.template
    └── spark-env.sh

And set the environment variable before initializing any JVM dependent objects:

import os
from pyspark.sql import SparkSession

os.environ["SPARK_CONF_DIR"] = "/path/to/configs/conf1"
spark  = SparkSession.builder.getOrCreate()

or

import os
from pyspark.sql import SparkSession

os.environ["SPARK_CONF_DIR"] = "/path/to/configs/conf2"
spark  = SparkSession.builder.getOrCreate()

This is a workaround and may not work in complex scenarios.

+2
source

Source: https://habr.com/ru/post/1693308/


All Articles