Background
Here is my situation: I am trying to create a class that filters RDD based on some content function, but this function may differ in different scenarios, so I would like to parameterize it using the function. Unfortunately, I seem to be having problems with how Scala captures its closures. Even though my function is serializable, the class is not.
From the example in, it seems that my situation cannot be resolved, but I am convinced there is a way to achieve what I am trying to do by creating the correct (smaller) closure.
My code
class MyFilter(getFeature: Element => String, other: NonSerializable) {
def filter(rdd: RDD[Element]): RDD[Element] = {
rdd.filter { elem => getFeature(elem) == "myTargetString" }
}
Simplified example
class Foo(f: Int => Double, rdd: RDD[Int]) {
def go(data: RDD[Int]) = data.map(f)
}
val works = new Foo(_.toDouble, otherRdd)
works.go(myRdd).collect() // works
val myMap = Map(1 -> 10d)
val complicatedButSerializableFunc: Int => Double = x => myMap.getOrElse(x, 0)
val doesntWork = new Foo(complicatedButSerializableFunc, otherRdd)
doesntWork.go(myRdd).collect() // craps out
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: $iwC$$iwC$Foo
Serialization stack:
- object not serializable (class: $iwC$$iwC$Foo, value: $iwC$$iwC$Foo@61e33118)
- field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC, name: foo, type: class $iwC$$iwC$Foo)
- object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC@47d6a31a)
- field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC)
- object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1, <function1>)
// Even though
val out = new ObjectOutputStream(new FileOutputStream("test.obj"))
out.writeObject(complicatedButSerializableFunc) // works
Questions
- Why doesn't the first simplified example try to serialize everything
Foo
, but does the second do? Foo
?