How to expose Spark Driver for Apache Zeppelin dockers?

I am currently creating a custom docker container from a simple distribution with Apache Zeppelin + Spark 2.x inside.

My Spark jobs will run in a remote cluster, and I use yarn-clientas master.

When I start the laptop and try to print sc.version, the program gets stuck. If I go to the remote resource manager, the application was created and accepted, but in the logs I can read:

INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable

My understanding of the situation is that the cluster cannot talk to the driver in the container, but I do not know how to solve this problem.

I am currently using the following configuration:

  • spark.driver.portset to PORT1and option -p PORT1:PORT1passed to container
  • spark.driver.hostset to 172.17.0.2(ip container)
  • SPARK_LOCAL_IPset to 172.17.0.2(ip container)
  • spark.ui.portset to PORT2and the parameter -p PORT2:PORT2passed to the container

I have the feeling that I have to change SPARK_LOCAL_IP to the host IP, but if I do, SparkUI will not be able to start, blocking the process a step earlier.

Thanks in advance for any ideas / tips!

+4
source share
1 answer

Good question! First of all, as you know, Apache Zeppelin runs interpreters in separate processes .

Architecture Architecture Apache Zeppelin

JVM- Spark SparkContext SparkDriver yarn-client. , Apache Spark documentation, \ YARN ApplicationMaster SparkWorkers .

Architecture Architecture Apache Spark

, -. ZEPL, , 7 , .

Anoter aproach Docker ( , -, os x, - )

+2
source

Source: https://habr.com/ru/post/1663405/


All Articles