I know that I can do random splitting with the randomSplit method:
val splittedData: Array[Dataset[Row]] = preparedData.randomSplit(Array(0.5, 0.3, 0.2))
Is it possible to split the data into serial parts using some nonRandomSplit method?
Apache Spark 2.0.1. Thanks in advance.
UPD: the data order is important, I'm going to prepare my model for data with "smaller identifiers" and test it on data with "large identifiers". So I want to split the data into sequential parts without shuffling.
eg.
my dataset = (0,1,2,3,4,5,6,7,8,9) desired splitting = (0.8, 0.2) splitting = (0,1,2,3,4,5,6,7), (8,9)
The only solution I can think of is to use count and limit , but probably better.