I have two RDD[K,V], where K=Longand V=Object. Lets call rdd1and rdd2. I have a regular custom Partitioner. I am trying to find a way to take unionor joinby avoiding or minimizing data movement.
val kafkaRdd1 =
val kafkaRdd2 =
val rdd1 = kafkaRdd1.partitionBy(new MyCustomPartitioner(24))
val rdd2 = kafkaRdd2.partitionBy(new MyCustomPartitioner(24))
val rdd3 = rdd1.union(rdd2)
val rdd3 = rdd1.leftOuterjoin(rdd2)
Is it possible to assume (or a way to provide) nth-Partitionboth rdd1, and rdd2on the same slavenode?
source
share