How to run Spark on Docker?

Cannot start Apache Spark on Docker.

When I try to contact my driver to fix the wizard, I get the following error:

04/15/03 13:08:28 WARN TaskSchedulerImpl: initial work is not accepted any resources; check your cluster user interface to make sure it is registered and has sufficient resources

+6
source share
2 answers

This error sounds like the workers did not register with the master.

This can be checked on the main spark web chair http://<masterip>:8080

You can also just use a different docker image or compare the docker images with what works, and see what is different.

I launched a spark source and a spark generator ,

If you have a Linux computer behind a NAT router, such as a home firewall that allocates addresses on a private network 192.168.1. * for machines, this script loads the spark master 1.3.1 and for launching dockers in separate containers with addresses 192.168.1.10 and .11, respectively. You may need to configure the addresses if 192.168.1.10 and 192.168.1.11 are already in use on your local network.

pipework is a utility for connecting a local network to a container instead of using the internal docker bridge.

Spark requires all machines to communicate with each other. As far as I can tell, the spark is not hierarchical; I saw workers trying to open ports to each other. Thus, in the shell script, I expose all the ports, which is normal if the machines are otherwise firewalls, for example, behind a home NAT router.

./ Introductory Docker-spark

 #!/bin/bash sudo -v MASTER=$(docker run --name="master" -h master --add-host master:192.168.1.10 --add-host spark1:192.168.1.11 --add-host spark2:192.168.1.12 --add-host spark3:192.168.1.13 --add-host spark4:192.168.1.14 --expose=1-65535 --env SPARK_MASTER_IP=192.168.1.10 -d drpaulbrewer/spark-master:latest) sudo pipework eth0 $MASTER 192.168.1.10/ 24@192.168.1.1 SPARK1=$(docker run --name="spark1" -h spark1 --add-host home:192.168.1.8 --add-host master:192.168.1.10 --add-host spark1:192.168.1.11 --add-host spark2:192.168.1.12 --add-host spark3:192.168.1.13 --add-host spark4:192.168.1.14 --expose=1-65535 --env mem=10G --env master=spark://192.168.1.10:7077 -v /data:/data -v /tmp:/tmp -d drpaulbrewer/spark-worker:latest) sudo pipework eth0 $SPARK1 192.168.1.11/ 24@192.168.1.1 

After running this script, I can see the main web report on 192.168.1.10:8080 or go to another machine on my local network that has a spark distribution and run ./spark-shell --master spark://192.168.1.10:7077 and it will output an interactive scala shell.

+5
source

The second is a more common reason for the docker case. You have to check that you

  • Open all necessary ports
  • Set the correct spark.broadcast.factory
  • Use docker aliases.

Without processing all 3 problems, the details of the spark block (master, worker, driver) cannot interact. You can carefully read each problem at http://sometechshit.blogspot.ru/2015/04/running-spark-standalone-cluster-in.html or use the container ready for the spark from https: //registry.hub.docker .com / u / epahomov / docker-spark /

If the problem is in resources, try to allocate fewer resources (number of executors, memory, kernels) with flags from https://spark.apache.org/docs/latest/configuration.html . Check how many resources you have on the spark wizard interface page, which is http: // localhost: 8080 by default.

+4
source

Source: https://habr.com/ru/post/984654/


All Articles