I am running a series of jobs, and the intermediate rdd is used in all jobs. So I cached intermediate rdds, but after some iterations of slowing it down. Then I used rdd check pointing after caching to break the line, which is not required. In the spark UI, I can confirm that the check mark is done correctly. But it also takes time because it writes every rdd to the local system. What is an effective way to break an unnecessary line without storing the actual rdd data?
source
share