How to duplicate RDD to multiple RDD?

Can I duplicate an RDD into two or more RDDs?

I want to use the cassandra-spark driver and save the RDD in the Cassandra table and, in addition, continue to do more calculations (and ultimately save the result in Cassandra).

+6
source share
1 answer

RDD are immutable, and conversions to RDDs create new RDDs. Therefore, there is no need to create copies of RDD for various operations.

You can save the underlying RDD for secondary storage and then apply operations to it.

This is normal:

 val rdd = ??? val base = rdd.byKey(...) base.saveToCassandra(ks,table) val processed = byKey.map(...).reduceByKey(...) processed.saveToCassandra(ks,processedTable) val analyzed = base.map(...).join(suspectsRDD).reduceByKey(...) analyzed.saveAsTextFile("./path/to/save") 
+7
source

Source: https://habr.com/ru/post/981210/


All Articles