Running on AWS EC2 I occasionally run into disk space issues - even after installing spark.cleaner.ttl for a few hours (we iterate quickly). I decided to solve them by moving the /root/spark/work directory to the installed ephemeral disk of the instance (I use r3.larges, which has a 32GB ephemer in /mnt ):
readonly HOST=some-ec2-hostname-here ssh -t root@ $HOST spark/sbin/stop-all.sh ssh -t root@ $HOST "for SLAVE in \$(cat /root/spark/conf/slaves) ; do ssh \$SLAVE 'rm -rf /root/spark/work && mkdir /mnt/work && ln -s /mnt/work /root/spark/work' ; done" ssh -t root@ $HOST spark/sbin/start-all.sh
As far as I can tell from Spark 1.5, the working directory still does not use the installed repository by default. I did not deal with deployment settings to make sure that this is even configurable.
source share