We execute SQL queries against the Spark EMR cluster using the Spark Thrift Server, and we see that when the SQL query (transferred to the Spark job) is completed, it moves the files located below it /mnt/yarn/usercache/root/appcache
, it is not cleared. This No space left on device
ultimately triggers after running multiple queries.
If we stop Spark Thrift Server, the shuffle files will be cleared. Is it possible to start cleaning not only after the application is stopped, but also after each work? We tried to set the following parameters
yarn.nodemanager.localizer.cache.cleanup.interval-ms=6000
yarn.nodemanager.localizer.cache.target-size-mb=1000
but the files are still not cleared. Any idea why this is happening and how we can avoid it?
source
share