Running Spark in YARN, cluster mode.
- 3 data nodes with YARN
- YARN => 32 vCores, 32 GB RAM
I am sending the Spark program as follows:
spark-submit \
--class com.blablacar.insights.etl.SparkETL \
--name ${JOB_NAME} \
--master yarn \
--num-executors 1 \
--deploy-mode cluster \
--driver-memory 512m \
--driver-cores 1 \
--executor-memory 2g \
--executor-cores 20 \
toto.jar json
I see that 2 jobs are working fine on 2 nodes. But I also see 2 other jobs only with the driver container!

Is it possible to run the driver if there is no resource for the worker?
source
share