Generally speaking, transferring all the data to the driver seems like a pretty bad idea, and most of the time there is a better solution, but if you really want to do this, you can use the method toLocalIterator
on RDD:
val df: org.apache.spark.sql.DataFrame = ???
df.cache // Optional, to avoid repeated computation, see docs for details
val iter: Iterator[org.apache.spark.sql.Row] = df.rdd.toLocalIterator
source
share