Exception: java.lang.Exception: when working with the main "thread" in the environment, HADOOP_CONF_DIR or YARN_CONF_DIR must be set. into sparks

I am the new Apache spark. I tested some application in spark offline. But I want to run the yarn application mode. I am running apache-spark 2.1.0 on windows.Here is my code

c:\spark>spark-submit2 --master yarn --deploy-mode client --executor-cores 4 --jars C:\DependencyJars\spark-streaming-eventhubs_2.11-2.0.3.jar,C:\DependencyJars\scalaj-http_2.11-2.3.0.jar,C:\DependencyJars\config-1.3.1.jar,C:\DependencyJars\commons-lang3-3.3.2.jar --conf spark.driver.userClasspathFirst=true --conf spark.executor.extraClassPath=C:\DependencyJars\commons-lang3-3.3.2.jar --conf spark.executor.userClasspathFirst=true --class "GeoLogConsumerRT" C:\sbtazure\target\scala-2.11\azuregeologproject_2.11-1.0.jar

EXCEPTION: When starting with the “yarn”, the wizard must be installed in the HADOOP_CONF_DIR or YARN_CONF_DIR environment. into sparks

therefore from a website search. I created the name of the Hadoop_CONF_DIR folder and placed hive site.xml in it and specified it as an environment variable, after which I ran spark-submit, after which I have

connection failure I think that I could not set the yarn mode configured correctly. Can someone help me solve this problem? Do I need to install Hadoop and yarn separately? I want to run my application in pseudo-distributed mode. Contact help me adjust the yarn mode in windows thanks

+4
source share
1 answer

You need to export two variables HADOOP_CONF_DIRand YARN_CONF_DIRto make your configuration file visible to the yarn. Use the code below in the .bashrc file if you are using linux.

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

In windows you need to set the environment variable.

Hope this helps!

+7
source

Source: https://habr.com/ru/post/1678778/


All Articles