Is .parallelize (...) a lazy operation in Apache Spark?

Are parallelized (and other loading operations) performed only during the execution of the Spark action or immediately after its discovery?

See def parallelize in spark code

Pay attention to various consequences, for example for .textFile (...): lazy evaluation means that, perhaps, saving some memory first, the text file should be read every time the action is performed, and that changing the text of the file will affect all actions after change.

+4
source share
4 answers

parallelize : . L726 "@note Parallelize ".

Spark , . collect count.

, Spark:

  • API (), . , , ,...
  • , "" Catalyst, .
+3

... ( )

parallelize ( Chandan), , , SparkContext textFile.

DataFrameReader.load, . ( JDBC, Cassandra) (CSV ).

+2

, RDD, . , RDD, . RDD ; , RDD.

+1

parallelize(), transformations .

: transformations, , actions, .

All conversions in Spark are lazy because they do not immediately calculate their results. Instead, they simply remember the transformations applied to some basic dataset (such as a file).The transformations are only computed when an action requires a result to be returned to the driver program

Take a look at the article to find out everything transformationsin Scala.

Read more about this documentation .

+1
source

Source: https://habr.com/ru/post/1621855/


All Articles