Apache Spark application troubleshooting is done in client mode from the Docker container

I am trying to connect to a standalone Apache Spark cluster from a docked Apache Spark application using Client mode.

The driver gives Spark Master and Workers their address. When launched inside the docker container, it will use some_docker_container_ip. The docker address is not visible from the outside, so the application will not work.

Spark has a property spark.driver.host. This property is transferred to the Master and Workers. My initial instinct was to pass the address of the machine there, so that the cluster would instead address the visible machine.

Unfortunately, it is spark.driver.hostalso used to configure the server using the driver. Passing the host address will lead to server startup errors because the docker container cannot bind ports to the host host.

It seems like a loss of situation. I can use neither the host address nor the docker container address.

Ideally, I would like to have two properties. spark.driver.host-to-bind-toused to configure the driver server and spark.driver.host-for-masterthat will be used by the wizard and the workers. Unfortunately, it looks like I'm stuck with just one property.

Another approach would be to use --net=hostwhen launching a docker container. This approach has many drawbacks (for example, other docker containers cannot be bound to container c --net=hostand must be open outside the docker network), and I would like to avoid it.

, ?

+4
1
+2

Source: https://habr.com/ru/post/1648808/


All Articles