How to configure Zeppelin to work with a remote EMR thread cluster

I have an Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with a yarn resource manager. I want to deploy Zeppelin on a separate machine to disconnect the EMR cluster when there are no jobs.

I tried the following instruction from here https://zeppelin.incubator.apache.org/docs/install/yarn_install.html with little success.

Can anyone demystify the steps how Zeppelin should connect to an existing Yarn cluster from another machine?

+4
source share
1 answer

[1] install Zeppelin with the appropriate parameters:

git clone https://github.com/apache/incubator-zeppelin.git ~/zeppelin;
cd ~/zeppelin;
mvn clean package -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests

[2] EMR_MASTER EC2 , Zeppelin ( , )

[3] EMR_MASTER:/etc/hadoop/conf MY_STANDALONE_SERVER:/home/zeppelin/hadoop-conf.

[4] zeppelin/conf/zeppelin-env.sh :

export MASTER=yarn-client
export HADOOP_CONF_DIR=/home/zeppelin/hadoop-conf

. , spark.executor.instances, , .

+10

Source: https://habr.com/ru/post/1607530/


All Articles