Regular scala collections have a great method collectthat allows me to perform an operation filter-mapin a single pass using a partial function. Is there an equivalent spark operation Datasets?
I would like for two reasons:
- syntactic simplicity
- it reduces style operations
filter-mapto a single pass (although in the lawsuit I assume there are optimizations that define these things for you).
Here is an example to show what I mean. Suppose I have a sequence of parameters, and I want to extract and double only certain integers (in Some):
val input = Seq(Some(3), None, Some(-1), None, Some(4), Some(5))
Method 1 - collect
input.collect {
case Some(value) => value * 2
}
collect does it pretty neatly syntactically and does one pass.
2 - filter-map
input.filter(_.isDefined).map(_.get * 2)
, .
, isDefined get . , Some s. . , , , .
3 - fold*
input.foldRight[List[Int]](Nil) {
case (nextOpt, acc) => nextOpt match {
case Some(next) => next*2 :: acc
case None => acc
}
}
, , , .
, , .
collect , , - .