What happens if the driver program works?

I understand that the working nodes are fault tolerant, but what happens if your driver program fails for some unexpected reason? (power off / memory problem, etc.).

I would suggest that you lose all the work since the code reading the results no longer works, or does Spark somehow know how to restart it? If so, how?

+5
source share
4 answers

As @zsxwing points out, it depends on how you run your driver. In addition to launching into yarn, you can also start your work in cluster deployment mode (this is a parameter to fix-present). In Spark Streaming, you specify --supervise and Spark will resume work for you. See the Spark Streaming Guide for more details.

+5
source

Yes, you can restart spark applications. There are several options available that apply to the cluster manager in use. For example, with a stand-alone Spark cluster with cluster deployment mode, you can also specify - monitor to make sure that the driver automatically reboots if it fails with a non-zero exit code. To list all such options available for spark-submit, run it with --help:

Running on a stand-alone Spark cluster in cluster deployment mode with control

./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --deploy-mode cluster \ --supervise \ /path/to/examples.jar \ 1000 
+2
source

we can use zookeeper and the local file system to configure high availability, which you can check in the official documentation

http://spark.apache.org/docs/latest/spark-standalone.html#high-availability

+1
source

According to the Spark documentation: -

Spark Standalone - the Spark application driver can be sent to run in a stand-alone Spark cluster (see cluster deployment mode), that is, the application driver itself runs on one of the working nodes. In addition, the stand-alone cluster manager can be instructed to monitor the driver and restart it if the driver fails due to a non-zero exit code or because of a node failure to start the driver. See Cluster mode and the Spark Standalone manual for more information.

So - Support will only work with the cluster offline mode. If your application is sent in the masonry mode, then the yarn will handle the driver reload, as indicated in the mapreduce.am.max-try property in mapred-site.xml, so your code should be such that it removes the output directory and starts from scratch. Else will fail with an output directory error already exists.

0
source

Source: https://habr.com/ru/post/1205730/


All Articles