I understand the common problem, "Task is not serializable," which occurs when accessing a field or method that goes beyond closing.
To fix this, I usually define a local copy of these fields / methods, which avoids serializing the whole class:
class MyClass(val myField: Any) { def run() = { val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv") val myField = this.myField println(f.map( _ + myField ).count) } }
Now, if I define a nested function in the run method, it cannot be serialized:
class MyClass() { def run() = { val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv") def mapFn(line: String) = line.split(";") val myField = this.myField println(f.map( mapFn( _ ) ).count) } }
I do not understand, since I thought that "mapFn" would be in scope ... Even a stranger, if I define mapFn as val instead of def, then it works:
class MyClass() { def run() = { val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv") val mapFn = (line: String) => line.split(";") println(f.map( mapFn( _ ) ).count) } }
Is this related to the way Scala presents nested functions?
What is the recommended way to deal with this problem? Avoid nested functions?
source share