Custom log4j.properties on AWS EMR

I cannot override and use Custom log4j.properties for Amazon EMR. I am running Spark on EMR (Yarn) and have tried all the combinations below in Spark-Submit to try using custom log4j.

--driver-java-options "-Dlog4j.configuration=hdfs://host:port/user/hadoop/log4j.properties" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=hdfs://host:port/user/hadoop/log4j.properties" 

I also tried to select from the local file system using the file: //// instead of hdfs. None of this works. However, I can get this to work when working on my local yarn setting.

Any ideas?

+6
source share
2 answers

log4j knows nothing about HDFS, so it cannot accept the hdfs: // path as a configuration file. See here for more information on configuring log4j as a whole.

To configure log4j on EMR, you can use the configuration API to add key-value pairs to the log4j.properties file, which is downloaded by the driver and performers. In particular, you want to add your properties to the spark-log4j configuration configuration.

+2
source

In principle, after talking with support and reading the documentation, I see that 2 options are available for this:

1 - pass log4j.properties through the configuration passed when creating EMR. Jonathan mentioned this in his answer.

2 - Turn on the switch --files / path / to / log4j.properties in the spark-submit command. This will distribute the log4j.properties file into the working directory of each Spark Executor, and then change your -Dlog4j configuration to only point to the file name: "spark.driver.extraJavaOptions = -Dlog4j.configuration = log4j.properties"

+1
source

Source: https://habr.com/ru/post/1015265/


All Articles