I am currently creating a custom docker container from a simple distribution with Apache Zeppelin + Spark 2.x inside.
My Spark jobs will run in a remote cluster, and I use yarn-clientas master.
When I start the laptop and try to print sc.version, the program gets stuck. If I go to the remote resource manager, the application was created and accepted, but in the logs I can read:
INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable
My understanding of the situation is that the cluster cannot talk to the driver in the container, but I do not know how to solve this problem.
I am currently using the following configuration:
spark.driver.portset to PORT1and option -p PORT1:PORT1passed to containerspark.driver.hostset to 172.17.0.2(ip container)SPARK_LOCAL_IPset to 172.17.0.2(ip container)spark.ui.portset to PORT2and the parameter -p PORT2:PORT2passed to the container
I have the feeling that I have to change SPARK_LOCAL_IP to the host IP, but if I do, SparkUI will not be able to start, blocking the process a step earlier.
Thanks in advance for any ideas / tips!
Thr37 source
share