Intrinsically safe server does not clean files randomly

Question

Intrinsically safe server does not clean files randomly

We execute SQL queries against the Spark EMR cluster using the Spark Thrift Server, and we see that when the SQL query (transferred to the Spark job) is completed, it moves the files located below it /mnt/yarn/usercache/root/appcache, it is not cleared. This No space left on deviceultimately triggers after running multiple queries.

If we stop Spark Thrift Server, the shuffle files will be cleared. Is it possible to start cleaning not only after the application is stopped, but also after each work? We tried to set the following parameters

yarn.nodemanager.localizer.cache.cleanup.interval-ms=6000
yarn.nodemanager.localizer.cache.target-size-mb=1000

but the files are still not cleared. Any idea why this is happening and how we can avoid it?

+4

shuffle yarn apache-spark amazon-emr spark-thriftserver

Naama galor Nov 09 '17 at 13:20

source share

No one has answered this question yet.

See related questions:

7

How does Spark on Yarn store shuffled files?

3

Spark-XML on AWS Adhesives

2

Spark Thrift Server and table

2

How can I get Spark Thrift Server to clear the cache?

2

Amazon EMR - Spark Web Interface for Thrift Custom Requests

1

drive in hadoop

1

Spark streaming job does not delete files randomly

0

How to start a spark (with a lean server) in non-blocking mode, which the bush can update and reload data into a spark (desktop)

0

Spark SQL cannot access Spark Thrift Server

0

Kill Spark Job or shut down the EMR cluster if it takes longer than expected

Intrinsically safe server does not clean files randomly

More articles: