Large Scheduler Delay in Apache Spark Tasks Using Deployment Mode Cluster

Using the spark-submit --master yarn --deploy-mode cluster with --master yarn --deploy-mode cluster causes more scheduler delays than using --master yarn --deploy-mode client .

Task Results Screenshot:

enter image description here

This primarily concerns jobs with the collect operation called on RDD.

The spark application running in client mode takes about 3-4 minutes, unlike cluster mode with 6-7 minutes. The size of each job within the steps is less than 100 KB. The cluster has 8 data nodes and works with Cloudera Manager 5.9.0

+6
source share
1 answer

The solution for this particular case. The problem was caused by a damaged Ethernet cable in the cluster infrastructure. After the replacement, the time was significantly reduced.

0
source

Source: https://habr.com/ru/post/1260462/


All Articles