- Dataframe A, , "Join and Shuffle".
Partitioner:
A.join(B, Seq("id"))
, , . , .

HashPartitioner:
partitionBy() Dataframe, Spark , hash-partitioned, join() . , A.join(B, Seq ( "id" )), Spark B RDD. B , A, B
:
val A = sc.sequenceFile[id, info1, info2]("hdfs://...")
.partitionBy(new HashPartitioner(100)) // Create 100 partitions
.persist()
A.join(B, Seq("id"))

.