How to set up a custom environment variable in EMR for use in a spark application

I need to set a custom environment variable in EMR that will be available when the spark application starts.

I tried to add this:

... --configurations '[ { "Classification": "spark-env", "Configurations": [ { "Classification": "export", "Configurations": [], "Properties": { "SOME-ENV-VAR": "qa1" } } ], "Properties": {} } ]' ... 

and also tried replacing spark-env with hadoop-env but nothing works.

There is this answer from aws forums. but I can’t understand how to apply it. I run EMR 5.3.1 and run it with a pre-configured step from cli: aws emr create-cluster...

+6
source share
2 answers

Add custom configurations, such as JSON, to the file, say custom_config.json

 [ { "Classification": "spark-env", "Properties": {}, "Configurations": [ { "Classification": "export", "Properties": { "VARIABLE_NAME": VARIABLE_VALUE, } } ] } ] 

And, when creating an emr cluster, pass the file link to the --configurations option

 aws emr create-cluster --configurations file://custom_config.json --other-options... 
+1
source

For me, replacing spark-env with yarn-env fixed the problem.

+1
source

Source: https://habr.com/ru/post/1015184/


All Articles