How to pass -D parameter or environment variable in Spark?

I want to change the Configafe config of a Spark job in a dev / prod environment. It seems to me that the easiest way to accomplish this is to pass -Dconfig.resource=ENVNAME to the job. Then the Configafe configs library will do the job for me.

Is there any way to pass this option directly to the job? Or maybe there is a better way to change the configuration of a job at runtime?

EDIT:

  • Nothing happens when I add the --conf "spark.executor.extraJavaOptions=-Dconfig.resource=dev" parameter --conf "spark.executor.extraJavaOptions=-Dconfig.resource=dev" to the spark-submit command.
  • I got Error: Unrecognized option '-Dconfig.resource=dev'. when I pass the -Dconfig.resource=dev command to the spark-submit command.
+49
scala apache-spark
Jan 27 '15 at 9:06
source share
7 answers

Modify the spark-submit command line by adding three parameters:

  • --files <location_to_your_app.conf>
  • --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
  • --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
+33
Jan 29 '15 at 12:09
source share

Here is my launch spark program with java option added

 /home/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \ --files /home/spark/jobs/fact_stats_ad.conf \ --conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf \ --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf' \ --class jobs.DiskDailyJob \ --packages com.databricks:spark-csv_2.10:1.4.0 \ --jars /home/spark/jobs/alluxio-core-client-1.2.0-RC2-jar-with-dependencies.jar \ --driver-memory 2g \ /home/spark/jobs/convert_to_parquet.jar \ AD_COOKIE_REPORT FACT_AD_STATS_DAILY | tee /data/fact_ad_stats_daily.log 

as you can see the custom configuration file --files /home/spark/jobs/fact_stats_ad.conf

Artist java options --conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf

java driver options. --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'

Hope this helps.

+13
Aug 26 '16 at 10:37
source share

I had a lot of problems passing parameters -D to fix the executors and the driver, I added a quote from my blog post about this: "The correct way to pass the parameter is through the property:" spark.driver.extraJavaOptions "and" spark.executor.extraJavaOptions ": Ive passed the log4J configurations property and the parameter that I need for the configurations. (For the driver, I could only pass the log4j configuration). For example (it was written in the properties file passed to spark-submit with" -properties-file "):"

 spark.driver.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties - spark.executor.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -Dapplication.properties.file=hdfs:///some/path/on/hdfs/app.properties spark.application.properties.file hdfs:///some/path/on/hdfs/app.properties 

"

You can read my blog post about common spark configurations. I also run on yarn.

+7
Jan 27 '15 at 18:00
source share

--files <location_to_your_app.conf> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app' --conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'

if you write in this way, the later --conf overwrite the previous one, you can check this by looking at sparkUI after starting work on the Environment tab.

so the correct way is to put the parameters on one line like this: --conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d' if you do this, you can find that all your settings will be shown in sparkUI mode.

+6
May 9 '17 at 20:39
source share

I am launching a Spark application using the spark-submit command launched from another Scala application. So I have an array like

 Array(".../spark-submit", ..., "--conf", confValues, ...) 

where confValues :

  • for yarn-cluster mode:
    "spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..."
  • for local[*] mode:
    "run.mode=development"

It's a little difficult to understand where (and not) to avoid quotes and spaces. You can check the Spark web interface for system property values.

+2
Jan 28 '15 at 0:56
source share

Use a method like the one below may be useful to you -

spark-submit --master local [2] - conf 'Spark.driver.extraJavaOptions = Dlog4j.configuration = file: /tmp/log4j.properties - conf' spark.executor.extraJavaOptions = -Dlog4j.configuration = file: / tmp / log4j.properties' --class com.test.spark.application.TestSparkJob target / application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod

I tried and it worked for me, I would also advise going through the header below the spark message, which is really useful - https://spark.apache.org/docs/latest/running-on-yarn.html

0
Nov 24 '17 at 2:31 on
source share

I originally had this configuration file:

 my-app { environment: dev other: xxx } 

This is how I load my configuration into my scala spark code:

 val config = ConfigFactory.parseFile(File<"my-app.conf">) .withFallback(ConfigFactory.load()) .resolve .getConfig("my-app") 

With this setting, even though the Configafe Config documentation and all the other answers say, overriding the system property did not work for me when I started my spark job as follows:

 spark-submit \ --master yarn \ --deploy-mode cluster \ --name my-app \ --driver-java-options='-XX:MaxPermSize=256M -Dmy-app.environment=prod' \ --files my-app.conf \ my-app.jar 

To make it work, I had to change my configuration file to:

 my-app { environment: dev environment: ${?env.override} other: xxx } 

and then run it like this:

 spark-submit \ --master yarn \ --deploy-mode cluster \ --name my-app \ --driver-java-options='-XX:MaxPermSize=256M -Denv.override=prod' \ --files my-app.conf \ my-app.jar 
0
Dec 28 '17 at 17:03
source share



All Articles