Is .parallelize (...) a lazy operation in Apache Spark?

Question

Is .parallelize (...) a lazy operation in Apache Spark?

Are parallelized (and other loading operations) performed only during the execution of the Spark action or immediately after its discovery?

See def parallelize in spark code

Pay attention to various consequences, for example for .textFile (...): lazy evaluation means that, perhaps, saving some memory first, the text file should be read every time the action is performed, and that changing the text of the file will affect all actions after change.

+4

scala apache-spark

Jonathan Dec 27 '15 at 11:45

source share

4 answers

... ( )

parallelize ( Chandan), , , SparkContext textFile.

DataFrameReader.load, . ( JDBC, Cassandra) (CSV ).

+2

zero323 27 . '15 13:55

, RDD, . , RDD, . RDD ; , RDD.

+1

Chandan 27 . '15 13:29

parallelize(), transformations .

: transformations, , actions, .

All conversions in Spark are lazy because they do not immediately calculate their results. Instead, they simply remember the transformations applied to some basic dataset (such as a file).The transformations are only computed when an action requires a result to be returned to the driver program

Take a look at the article to find out everything transformationsin Scala.

Is .parallelize (...) a lazy operation in Apache Spark?

More articles: