If I have a simple set of Scala Ints and I define a simple isPositive method to return true if the value is greater than 0, then I can simply pass the method to the filter collection method, as in the example below
def isPositive(i: Int): Boolean = i > 0 val aList = List(-3, -2, -1, 1, 2, 3) val newList = aList.filter(isPositive) > newList: List[Int] = List(1, 2, 3)
So, as I understand it, the compiler can automatically convert the method to an instance of the function by executing the eta extension, and then pass that function as a parameter.
However, if I do the same with the Spark dataset:
val aDataset = aList.toDS val newDataset = aDataset.filter(isPositive) > error
It does not work with the well-known error "Missing arguments for the method". To make it work, I have to explicitly convert the method to a function using "_":
val newDataset = aDataset.filter(isPositive _) > newDataset: org.apache.spark.sql.Dataset[Int] = [value: int]
Although it works with map as expected:
val newDataset = aDataset.map(isPositive) > newDataset: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]
Examining the signatures, I see that the signature for the Dataset filter is very similar to the List filter:
// Dataset: def filter(func: T => Boolean): Dataset[T] // List (Defined in TraversableLike): def filter(p: A => Boolean): Repr
So why does the compiler not perform the eta extension for the Dataset filter operation?