Spark 2.3.0, (2.3.1-SNAPSHOT), . ( - ), 2.1.2 2.3.0 ( ).
dfWithPar.show ( Spark SQL Dataset API Scala) ( ).
scala> dfWithPar.explain
== Physical Plan ==
*(1) Project [Id
+- *(1) BroadcastHashJoin [parentId
:- LocalTableScan [Id
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
+- LocalTableScan [Id
?
, Spark.

tl; dr Spark BroadcastHashJoinExec, Dataset.show.
Spark , ( API Dataset) RDD API.
Spark SQL Datasets Spark Core RDD Spark. RDD - Spark ( - JVM), Datasets SQL- ( JVM-, Scala Java, - JVM, ).
, Dataset API RDD ( , Java Scala - JVM).
API- Dataset RDD API DataFrame Dataset, RDD.
, Dataset.show RDD-, , , , Spark.
Dataset.show ( numRows 20 ) showString, take (numRows + 1), Array[Row].
val takeResult = newDf.select(castCols: _*).take(numRows + 1)
, dfWithPar.show() dfWithPar.take(21), , , dfWithPar.head(21) Spark.
SQL. .

show take head collectFromPlan, Spark ( executeCollect).
, - , . Spark .

BroadcastHashJoin BroadcastExchangeExec
BroadcastHashJoinExec , ( spark.sql.autoBroadcastJoinThreshold, 10M ).
BroadcastExchangeExec () ( BroadcastHashJoinExec).
BroadcastHashJoinExec ( RDD[InternalRow]), , , , BroadcastExchangeExec ( ).
ThreadPoolExecutor.java:1149 0.
:
val r = dfWithPar.rdd
, RDD, , .

Spark, .
RDD.take
, , , , , .. show, take head, RDD.take.
take (num: Int): Array [T] num RDD. , .
, take : " , , , , ". , Spark .
( ) Spark, 4 ,
// RDD.take
def take(num: Int): Array[T] = withScope {
...
while (buf.size < num && partsScanned < totalParts) {
...
val res = sc.runJob(this, (it: Iterator[T]) => it.take(left).toArray, p)
...
}
}
RDD.take 21 .
r.take(21)
2 Spark, .

, Spark , dfWithPar.show(1).
?
? 1, 0. ?
, Spark RDD.take(20).
Spark , , Spark .