When I am jointwo RDD, where is the actual data, that is, the data aggregated on the driver and then sent back to the work nodes or one of the nodes randomly selected to βreceiveβ the data? Also, if I call partitionon pairRDD, is it automatically partitioned using the key?
source
share