Spark on YARN: run driver without working

Question

Spark on YARN: run driver without working

Running Spark in YARN, cluster mode.

3 data nodes with YARN
YARN => 32 vCores, 32 GB RAM

I am sending the Spark program as follows:

spark-submit \
    --class com.blablacar.insights.etl.SparkETL \
    --name ${JOB_NAME} \
    --master yarn \
    --num-executors 1 \
    --deploy-mode cluster \
    --driver-memory 512m \
    --driver-cores 1 \
    --executor-memory 2g \
    --executor-cores 20 \
    toto.jar json

I see that 2 jobs are working fine on 2 nodes. But I also see 2 other jobs only with the driver container!

Is it possible to run the driver if there is no resource for the worker?

+4

hadoop yarn apache-spark

Thomas decaux Apr 2 '17 at 13:52

source share

1 answer

Thomas Decaux · Accepted Answer · 2017-06-08T18:02:01+0000

In fact, there is a limit on the resource limit on the "Application Master" (in the case of Spark, this is the driver):

yarn.scheduler.capacity.maximum-am-resource-percent

From http://maprdocs.mapr.com/home/AdministratorGuide/Hadoop2.xCapacityScheduler-RunningPendingApps.html :

, - .

, YARN Spark . Youpi!

Spark on YARN: run driver without working

More articles: