I have two RDD[K,V]
, where K=Long
and V=Object
. Lets call rdd1
and rdd2
. I have a regular custom Partitioner. I am trying to find a way to take union
or join
by avoiding or minimizing data movement.
val kafkaRdd1 =
val kafkaRdd2 =
val rdd1 = kafkaRdd1.partitionBy(new MyCustomPartitioner(24))
val rdd2 = kafkaRdd2.partitionBy(new MyCustomPartitioner(24))
val rdd3 = rdd1.union(rdd2)
val rdd3 = rdd1.leftOuterjoin(rdd2)
Is it possible to assume (or a way to provide) nth-Partition
both rdd1
, and rdd2
on the same slave
node?
source
share