What does "Stage Skipped" mean in the Apache Spark web interface?

From my Spark UI. What does this mean missed?

enter image description here

+60
apache-spark rdd
Jan 03 '15 at 19:26
source share
1 answer

This usually means that the data has been extracted from the cache, and there is no need to repeat this step. This is consistent with your DAG, which shows that the next step requires shuffling ( reduceByKey ). Whenever a shuffle occurs, Spark automatically caches the generated data :

Shuffle also generates a large number of intermediate files on disk. Starting with Spark 1.3, these files are saved until the corresponding RDDs are no longer used and garbage collected. This is to ensure that the shuffle files do not need to be re-created if the line is recalculated.

+77
Jan 03 '15 at 20:19
source share



All Articles