I was wondering in which Spark directory the data is stored if the storage level is set to DISK_ONLY or MEMORY_AND_DISK (data that in this case does not fit into memory). Because I see that it does not matter at what level I am set up. If a program crashes with the MEMORY_ONLY level, it also crashes with all other levels.
In the cluster that I use, the / tmp directory is a RAM disk and therefore limited in size. Is Spark trying to store disk-level data on this disk? Maybe that’s why I don’t see the difference. If this is true, how can I change this default behavior? If I use the yarn cluster that comes with Hadoop, do I need to change the / tmp folder in hadoop configuration files or just change spark.local.dir using Spark?
source
share