How to set up a custom environment variable in EMR for use in a spark application

Question

How to set up a custom environment variable in EMR for use in a spark application

I need to set a custom environment variable in EMR that will be available when the spark application starts.

I tried to add this:

... --configurations '[ { "Classification": "spark-env", "Configurations": [ { "Classification": "export", "Configurations": [], "Properties": { "SOME-ENV-VAR": "qa1" } } ], "Properties": {} } ]' ...

and also tried replacing spark-env with hadoop-env but nothing works.

There is this answer from aws forums. but I can’t understand how to apply it. I run EMR 5.3.1 and run it with a pre-configured step from cli: aws emr create-cluster...

+6

environment-variables amazon-web-services hadoop emr apache-spark

NetanelRabinowitz Feb 22 '17 at 15:00

source share

2 answers

franklinsijo · Answer 1 · 2017-02-22T16:39:45+0000

Add custom configurations, such as JSON, to the file, say custom_config.json

 [ { "Classification": "spark-env", "Properties": {}, "Configurations": [ { "Classification": "export", "Properties": { "VARIABLE_NAME": VARIABLE_VALUE, } } ] } ]

And, when creating an emr cluster, pass the file link to the --configurations option

 aws emr create-cluster --configurations file://custom_config.json --other-options...

Przemek · Answer 2 · 2017-11-16T11:20:06+0000

For me, replacing spark-env with yarn-env fixed the problem.

How to set up a custom environment variable in EMR for use in a spark application

More articles: