Why is the scala method serialized and the function not?

I have an RDD spark defined as follows:

val dataset = CreateRDD(data.filter(someFilter))

I noticed the following:

//if filter is defined as function, such as following, 
//then spark will throw spark `task not serialisable exception`
val someFilter = (some) => true
//if filter is defined as method, such as following then everything will be fine
def someFilter(some) => true

why?

yes, function / method defined as members in test specification

+4
source share
1 answer

The problem is this:

val isNegative = (num: Int) => num < 0

is just syntactic sugar for this:

val isNegative = new Function1[Int, Boolean] {
  def apply(num: Int): Boolean = num < 0
}

Function1is a sign, and an anonymous function is not serializable. When you have something like this:

object Tests {
  def isNegative(num: Int): Boolean = num < 0
}

Now isNegativeis a member Teststhat is serializable. When you call this:

val dataset = CreateRDD(data.filter(isNegative))

isNegative node. , , def, , , val, Spark isNegative, .

0

Source: https://habr.com/ru/post/1677999/


All Articles