How to specify which version of java to use in spark-submit command?

I want to run a spark flow application in a yarn cluster on a remote server. The default java version is 1.7, but I want to use 1.8 for my application, which is also present on the server but not standard. Is there a way to specify through javascript to represent the location of java 1.8 so that I don't get major.minor error?

+5
source share
3 answers

JAVA_HOME in our case was not enough, the driver worked in java 8, but later I found that the Spark workers in YARN were launched using java 7 (hadoop was installed as a Java version on nodes).

I had to add spark.executorEnv.JAVA_HOME=/usr/java/<version available in workers> to spark-defaults.conf . Note that you can provide it on the command line with --conf .

See http://spark.apache.org/docs/latest/configuration.html#runtime-environment

+9
source

Although you can make the driver code work on a specific version of Java ( export JAVA_HOME=/path/to/jre/ && spark-submit ... ), workers will execute code with the default version of Java from the user PATH of the yarn from the work computer.

What you can do is set up each Spark instance to use a specific JAVA_HOME by editing spark-env.sh files ( documentation ).

+2
source

Add the JAVA_HOME you want in spark-env.sh (sudo find -name spark-env.sh ... ej.:/etc/spark2/conf.cloudera.spark2_on_yarn/spark-env.sh)

0
source

Source: https://habr.com/ru/post/1247935/


All Articles