This is Steffen Schmitz's improved answer, which is actually incorrect. I improved it for posterity and summarized it. However, I really wonder about scale performance.
var numberOfNew = 4
var input = Seq((1,2),(3,4),(5,6),(7,8),(9,10),(11,12)).toDF
var newFrames = 0 to numberOfNew-1 map (_ => Seq.empty[(Int, Int)].toDF) toArray
var size = input.count();
val limit = (size / numberOfNew).toInt
val limit_fract = (size / numberOfNew.toFloat)
val residual = ((limit_fract.toDouble - limit.toDouble) * size).toInt
var limit_to_use = limit
while (numberOfNew > 0) {
if (numberOfNew == 1 && residual != 0) limit_to_use = residual
newFrames(numberOfNew-1) = input.limit(limit_to_use)
input = input.except(newFrames(numberOfNew-1))
size = size - limit
numberOfNew = numberOfNew - 1
}
newFrames.foreach(_.show)
val singleDF = newFrames.reduce(_ union _)
singleDF.show(false)
returns individual data frames:
+---+---+
| _1| _2|
+---+---+
| 7| 8|
| 3| 4|
| 11| 12|
+---+---+
+---+---+
| _1| _2|
+---+---+
| 5| 6|
+---+---+
+---+---+
| _1| _2|
+---+---+
| 9| 10|
+---+---+
+---+---+
| _1| _2|
+---+---+
| 1| 2|
+---+---+
source
share