Dataset filter: eta extension is not performed automatically

If I have a simple set of Scala Ints and I define a simple isPositive method to return true if the value is greater than 0, then I can simply pass the method to the filter collection method, as in the example below

 def isPositive(i: Int): Boolean = i > 0 val aList = List(-3, -2, -1, 1, 2, 3) val newList = aList.filter(isPositive) > newList: List[Int] = List(1, 2, 3) 

So, as I understand it, the compiler can automatically convert the method to an instance of the function by executing the eta extension, and then pass that function as a parameter.

However, if I do the same with the Spark dataset:

 val aDataset = aList.toDS val newDataset = aDataset.filter(isPositive) > error 

It does not work with the well-known error "Missing arguments for the method". To make it work, I have to explicitly convert the method to a function using "_":

 val newDataset = aDataset.filter(isPositive _) > newDataset: org.apache.spark.sql.Dataset[Int] = [value: int] 

Although it works with map as expected:

 val newDataset = aDataset.map(isPositive) > newDataset: org.apache.spark.sql.Dataset[Boolean] = [value: boolean] 

Examining the signatures, I see that the signature for the Dataset filter is very similar to the List filter:

 // Dataset: def filter(func: T => Boolean): Dataset[T] // List (Defined in TraversableLike): def filter(p: A => Boolean): Repr 

So why does the compiler not perform the eta extension for the Dataset filter operation?

+5
source share
1 answer

This is due to the nature of overloaded methods and the expansion of ETA. The Eta extension between methods and functions with overloaded methods in Scala explains why this fails.

Its essence is as follows (my selection):

when overloaded, applicability is undermined because there is no expected type (6.26.3, notorious) . If not overloaded, 6.26.2 applies (this is an extension) because the type of the parameter determines the expected type. When overloaded, the arg argument is specifically entered with no expected type , so 6.26.2 does not apply; therefore, no overloaded option d is considered applicable.

.....

Candidates for transhipment authorization are pre-shielded by the β€œform”. the form test encapsulates the intuition that an eta extension is never used because arguments are printed without the expected type. This example shows that the eta extension is not used, even if it is "the only way for an expression to check type."

As @DanielDePaula points out, the reason we don't see this effect in DataSet.map is because the overloaded method actually takes an additional parameter Encoder[U] :

 def map[U : Encoder](func: T => U): Dataset[U] = withTypedPlan { MapElements[T, U](func, logicalPlan) } def map[U](func: MapFunction[T, U], encoder: Encoder[U]): Dataset[U] = { implicit val uEnc = encoder withTypedPlan(MapElements[T, U](func, logicalPlan)) } 
+4
source

Source: https://habr.com/ru/post/1270716/


All Articles