Why Spark spark.SparkException report: File. /someJar.jar exists and does not match the contents

Sometimes I see that when I perform Spark jobs, I get the following error message:

13/10/21 21:27:35 INFO cluster.ClusterTaskSetManager: Loss occurred due to spark.SparkException: File. /someJar.jar exists and does not match the contents ...

What does it mean? How do I diagnose and fix this?

+6
source share
2 answers

After searching the logs, I also found "no spaces on the device" exceptions, and then when I ran df -h and df -i on each node, I found that the section was full. Interestingly, this section is apparently not used for data, but temporarily stores banks. This name was something like /var/run or /run .

The solution was to clear the old files section and set up some automatic cleaning, I think setting spark.cleaner.ttl for one day (86400) should prevent this event from happening again.

+7
source

Running on AWS EC2 I occasionally run into disk space issues - even after installing spark.cleaner.ttl for a few hours (we iterate quickly). I decided to solve them by moving the /root/spark/work directory to the installed ephemeral disk of the instance (I use r3.larges, which has a 32GB ephemer in /mnt ):

 readonly HOST=some-ec2-hostname-here ssh -t root@ $HOST spark/sbin/stop-all.sh ssh -t root@ $HOST "for SLAVE in \$(cat /root/spark/conf/slaves) ; do ssh \$SLAVE 'rm -rf /root/spark/work && mkdir /mnt/work && ln -s /mnt/work /root/spark/work' ; done" ssh -t root@ $HOST spark/sbin/start-all.sh 

As far as I can tell from Spark 1.5, the working directory still does not use the installed repository by default. I did not deal with deployment settings to make sure that this is even configurable.

+1
source

Source: https://habr.com/ru/post/974931/


All Articles