During the development process, I ran my spark jobs in client mode. I use "--file" to exchange configuration files with artists. The driver read configuration files locally. Now I want to deploy this task in cluster mode. I'm having difficulty sharing configuration files with the driver.
Ex, I pass the name of the configuration file as extraJavaOptions for both the driver and the executors. I am reading a file using SparkFiles.get ()
val configFile = org.apache.spark.SparkFiles.get(System.getProperty("config.file.name"))
This works well with artists, but does not work with drivers. I think that the files are shared only with the executors, and not with the container in which the driver works. One option is to save the configuration files to S3. I wanted to check if this could be achieved using spark-submit.
> spark-submit --deploy-mode cluster --master yarn --driver-cores 2
> --driver-memory 4g --num-executors 4 --executor-cores 4 --executor-memory 10g \
> --files /home/hadoop/Streaming.conf,/home/hadoop/log4j.properties \
> --conf **spark.driver.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --conf **spark.executor.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --class ....
source
share