What does "Stage Skipped" mean in the Apache Spark web interface?

Question

What does "Stage Skipped" mean in the Apache Spark web interface?

From my Spark UI. What does this mean missed?

+60

apache-spark rdd

Aravind R. Yarram Jan 03 '15 at 19:26

source share

1 answer

zero323 · Accepted Answer · 2016-01-03 20:19

This usually means that the data has been extracted from the cache, and there is no need to repeat this step. This is consistent with your DAG, which shows that the next step requires shuffling ( reduceByKey ). Whenever a shuffle occurs, Spark automatically caches the generated data :

Shuffle also generates a large number of intermediate files on disk. Starting with Spark 1.3, these files are saved until the corresponding RDDs are no longer used and garbage collected. This is to ensure that the shuffle files do not need to be re-created if the line is recalculated.

What does "Stage Skipped" mean in the Apache Spark web interface?

More articles: