Spark: Executor Lost Failure (After adding a GroupBy job)

I am trying to run Spark work on a Yarn client. I have two nodes, and each node has the following configurations. enter image description here

Im getting "ExecutorLostFailure (performer 1 lost)".

I tried most of the Spark setup configuration. I reduced to one artist the lost, because initially I got as 6 artist failures.

This is my configuration (my spark-submit):

HADOOP_USER_NAME = hdfs spark-submit --class genkvs .CreateFieldMappings --master yarn-client --driver-memory 11g --executor-memory 11G --total-executor-core 16 --num-executors 15 --conf "spark.executor.extraJavaOptions = -XX: + UseCompressedOops -XX: + PrintGCDetails -XX: + PrintGCTimeStamps" --conf spark.akka.frameSize = 1000 --conf spark.shuffle.memoryFraction = 1 --conf spark.rdd.compress = true --conf spark.core.connection.ack.wait.timeout = 800 my-data/lookup_cache_spark-assembly-1.0-SNAPSHOT.jar -h hdfs://hdp- node -1.zone24x7.lk:8020 -p 800

6 , groupBy .

def process(in: RDD[(String, String, Int, String)]) = {
    in.groupBy(_._4)
}

, , , . , , .

.

+4
1

:

  • spark.shuffle.memoryFraction 1. , 0.2? .

  • 11G 16 . 11G 3 - ( ) 1. 16 700mb, , OOME/ .

+1

Source: https://habr.com/ru/post/1615356/


All Articles