I am trying to use a spark on yarn in a scala sbt application instead of using spark-submit directly.
I already have a remote yarn cluster, and I can connect to the spark start of a yarn cluster in SparkR. But when I try to do this in a scala application, it cannot load environment variables in the yarn configuration and instead use the default address and port of the yarn.
The sbt application is just a simple object:
object simpleSparkApp { def main(args: Array[String]): Unit = { val conf = new SparkConf() .setAppName("simpleSparkApp") .setMaster("yarn-client") .set("SPARK_HOME", "/opt/spark-1.5.1-bin-hadoop2.6") .set("HADOOP_HOME", "/opt/hadoop-2.6.0") .set("HADOOP_CONF_DIR", "/opt/hadoop-2.6.0/etc/hadoop") val sc = new SparkContext(conf) } }
When I run this application in Intellij IDEA, the log says:
15/11/15 18:46:05 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 15/11/15 18:46:06 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 15/11/15 18:46:07 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) ...
It seems that the environment has not been added correctly, because 0.0.0.0 not the ip of the remote node yarn resource manager, but my spark-env.sh has:
export JAVA_HOME="/usr/lib/jvm/ibm-java-x86_64-80" export HADOOP_HOME="/opt/hadoop-2.6.0" export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop" export SPARK_MASTER_IP="master"
and my yarn-site.xml has:
<property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property>
How can I correctly add environment variables in setting up a yarn cluster in this sbt Spark app?
Additional Information:
My system is Ubuntu14.04, and the SparkR code that can connect to a yarn cluster looks like this:
Sys.setenv(HADOOP_HOME = "/opt/hadoop-2.6.0") Sys.setenv(SPARK_HOME = "/opt/spark-1.4.1-bin-hadoop2.6") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library(SparkR) sc <- sparkR.init(master = "yarn-client")