Why is argument extraction from a spark in a local variable considered safer?

I saw this example in this book, Learning Spark: Lightning-Fast Big Data Analysis:

class SearchFunctions(val query: String) {
 // more methods here
 def getMatchesNoReference(rdd: RDD[String]): RDD[String] = {
 // Safe: extract just the field we need into a local variable
 val query_ = this.query
 rdd.map(x => x.split(query_))
 }
}

My question is, the comment says: Safe: retrieve only the desired field in a local variable

Why is fetching from a local variable safer than using a field (defined as val)?

+4
source share
3 answers

Passing functions in Spark is really helpful and answers your question.

, , , , ().

( map() ), :

... , .


, , , .

, , , (, YARN), , , , !

, . TCP- reset , , , , .

+4

, query_, .

, SearchFunctions.

+2

: , , - , , , . Java , . StackOverflow : , , : java.io.NotSerializableException , , SparkContext -, SparkContext , Spark. spark NotSerializableException , , .

Or it may be serializable now, but apparently changes not related to it (for example, adding a field that is not used by the lambda) may violate your code, making it non-serializable or significantly reducing your performance.

+2
source

Source: https://habr.com/ru/post/1653212/


All Articles