Where is the Spark RDD line stored?

Where is the Spark RDD Lineage stored? According to white paper on RDD, it is stored in memory, but wants to know if it is on the driver's side or somewhere else in the cluster.

As well as fault tolerance, i.e. how many copies of RDD (metadata) are created by default?

I want to understand the behavior of the main structure if we do not use the persist () method.

+4
source share
1 answer

The RDD line lives on the driver where the RDD lives. When jobs are submitted, this information is no longer relevant. This is the inside of any RDD and how it knows parents.

, RDD , . ... ... .

+3

Source: https://habr.com/ru/post/1623698/


All Articles