What happens if RDD cannot fit into Spark's memory?

As far as I know, Spark is trying to perform all the calculations in memory, if you did not call the save with the option to store the disk. If, however, we do not use any emphasis, what does Spark do when the RDD does not fit into memory? What if we have very big data. How will Spark handle this without fail?

+4
source share
1 answer

From the Apache Spark FAQ:

Spark operators spill data onto a disk if it does not fit into memory, which allows it to work well with data of any size. Likewise, cached data sets that do not fit into memory are either poured onto the disk or redistributed on the fly if necessary, as determined by the RDD storage level.

Refer to the link below to learn more about storage tiers and how to choose the appropriate tier between these tiers: programming-guide.html

+8
source

Source: https://habr.com/ru/post/1607421/


All Articles