RDD are immutable, and conversions to RDDs create new RDDs. Therefore, there is no need to create copies of RDD for various operations.
You can save the underlying RDD for secondary storage and then apply operations to it.
This is normal:
val rdd = ??? val base = rdd.byKey(...) base.saveToCassandra(ks,table) val processed = byKey.map(...).reduceByKey(...) processed.saveToCassandra(ks,processedTable) val analyzed = base.map(...).join(suspectsRDD).reduceByKey(...) analyzed.saveAsTextFile("./path/to/save")
maasg source share