Sharing configuration files using Fix in cluster mode

Question

Sharing configuration files using Fix in cluster mode

During the development process, I ran my spark jobs in client mode. I use "--file" to exchange configuration files with artists. The driver read configuration files locally. Now I want to deploy this task in cluster mode. I'm having difficulty sharing configuration files with the driver.

Ex, I pass the name of the configuration file as extraJavaOptions for both the driver and the executors. I am reading a file using SparkFiles.get ()

  val configFile = org.apache.spark.SparkFiles.get(System.getProperty("config.file.name"))

This works well with artists, but does not work with drivers. I think that the files are shared only with the executors, and not with the container in which the driver works. One option is to save the configuration files to S3. I wanted to check if this could be achieved using spark-submit.

> spark-submit --deploy-mode cluster --master yarn --driver-cores 2
> --driver-memory 4g --num-executors 4 --executor-cores 4 --executor-memory 10g \
> --files /home/hadoop/Streaming.conf,/home/hadoop/log4j.properties \
> --conf **spark.driver.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --conf **spark.executor.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --class ....

+4

yarn apache-spark spark-streaming

Cheeko Oct 21 '16 at 14:35

source share

2 answers

Shankar · Answer 1 · 2016-10-22T08:53:08+0000

You need to try the parameter --properties-filein the Spark send command.

For example, the contents of a properties file

spark.key1=value1
spark.key2=value2

All keys must be prefixedwith spark.

then use the spark-submit command to submit the properties file.

bin/spark-submit --properties-file  propertiesfile.properties

Then in the code you can get the keys using the sparkcontext method getConfbelow.

sc.getConf.get("spark.key1")  // returns value1

, .

Peter Pan · Answer 2 · 2018-04-16T21:48:24+0000

.

, --files, '#alias'. .

, .

spark-submit --master yarn-cluster --files test.conf#testFile.conf test.py

test.py :

path_f = 'testFile.conf'
try:
    f = open(path_f, 'r')
except:
    raise Exception('File not opened', 'EEEEEEE!')

test.conf

Sharing configuration files using Fix in cluster mode

More articles: