What is the principle? When does Spark process data that exceeds memory?

As I know, Spark uses memory to cache data, and then to calculate data in memory. But what if the data has more memory? I could read the source code, but I don’t know which class runs the schedule? Or could you explain the principle of how Spark deals with this issue?

+4
source share
1 answer

om-nom-nom gave an answer, but only as a comment for some reason, so I decided to post it as a real answer:

https://spark.apache.org/docs/latest/scala-programming-guide.html#rdd-persistence

0
source

Source: https://habr.com/ru/post/1537604/


All Articles