Monitor various memory usage blocks in Spark and find out what ends in OOM?

Question

Monitor various memory usage blocks in Spark and find out what ends in OOM?

Apache Spark has 3 memory blocks:

The cache is where the RDDs are put when you call cacheorpersist
In random order. This is a memory block used for shuffling operations (grouping, remaking and reduceByKey.
Heap. This is where ordinary JVM objects are stored.

Now I would like to control the amount of memory used as% of each block by the task, so that I can know that I have to configure these numbers so that Cache and Shuffle do not spill onto the disk and that the heap is not OOM. For instance. every few seconds I get an update like:

Cache: 40% use (40/100 GB)
Shuffle: 90% use (45/50 GB)
Heap: 10% use (1/10 GB)

I know that I can experiment to find sweet spots using other methods, but I find it very difficult and I can just follow the usage, which will greatly simplify the work of writing and customizing Spark.

+4

scala memory monitoring apache-spark

samthebest Jun 12 '14 at 16:49

source share

No one has answered this question yet.

See related questions:

24

Monitoring JVM Heap Memory Usage

4

Spark clears shuffle spilled onto disk

4

Does Spark write intermediate intermediate outputs to disk