Where does Spark store data when the storage tier is set to disk?

Question

Where does Spark store data when the storage tier is set to disk?

I was wondering in which Spark directory the data is stored if the storage level is set to DISK_ONLY or MEMORY_AND_DISK (data that in this case does not fit into memory). Because I see that it does not matter at what level I am set up. If a program crashes with the MEMORY_ONLY level, it also crashes with all other levels.

In the cluster that I use, the / tmp directory is a RAM disk and therefore limited in size. Is Spark trying to store disk-level data on this disk? Maybe that’s why I don’t see the difference. If this is true, how can I change this default behavior? If I use the yarn cluster that comes with Hadoop, do I need to change the / tmp folder in hadoop configuration files or just change spark.local.dir using Spark?

+4

scala hadoop yarn bigdata apache-spark

Metallicpriest Sep 17 '15 at 12:24

source share

1 answer

None · Answer 1 · 2015-09-17T16:02:48+0000

Yes Spark binds to store disk level data on this disk.

, Spark Spark, , YARN (Hadoop YARN config yarn.nodemanager.local-dirs). spark.local.dir, .

: https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes

, , , yarn.nodemanager.local-dirs

Where does Spark store data when the storage tier is set to disk?

More articles: