Spark stores intermediate data on disk from a shuffle operation as part of under-the-hood optimization. When a spark needs to recount part of an RDD plot, it can truncate the line of an RDD plot if RDD is already present as a side effect of an earlier shuffle. This can happen even if the RDD is not cached or explicitly stored.
The source of this answer is O'Reilly's book Exploring the Spark by Karau, Konwinsky, Wendell, and Zachariah. Chapter 8: Configuring and debugging Spark. Section: Execution Components: tasks, tasks and stages.
source share