Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

Question

Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

I use a single Spark Standalone machine with 128 GB memory and 32 cores. Below are the settings that I think are relevant to my problem:

spark.storage.memoryFraction 0.35 spark.default.parallelism 50 spark.sql.shuffle.partitions 50

I have a Spark app that has a loop for 1000 devices. With each cycle (device), he prepares a vector of functions, and then calls the k-means MLLib. At the 25-30th iteration of the loop (processing from the 25th to the 30th device), it starts in the error "Java.lang.OutOfMemoryError: Java heap space".

I tried memoryFraction from 0.7 to 0.35, but that did not help. I also tried parallelism / partition up to 200 with no luck. The JVM option is "-Xms25G -Xmx25G -XX: MaxPermSize = 512m". The size of my data is only about 2G.

Here is the stack trace:

 java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1841) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:138) at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:136) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashTable$class.serializeTo(HashTable.scala:125) at scala.collection.mutable.HashMap.serializeTo(HashMap.scala:40) at scala.collection.mutable.HashMap.writeObject(HashMap.scala:136) at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)

At the beginning, the application looks great, but after it has been working for some time and processing more and more devices, the Java heap takes up gradually and the JVM does not free memory. How to diagnose and fix such a problem?

+5

java jvm out-of-memory heap-memory apache-spark

wdz Nov 27 '15 at 19:53

source share

3 answers

Sumit · Answer 1 · 2015-11-29T01:00:49+0000

In addition to the memory of Driver and Executor, it is proposed to try the following options: -

Switch to the Kryo Serialization series - http://spark.apache.org/docs/latest/tuning.html#data-serialization
Use MEMORY_AND_DISK_SER_2 to save the RDD.

Also, it would be nice if you could post the code.

R sawt · Answer 2 · 2015-11-27T20:05:33+0000

You can always use profiler tools like visualVM . to monitor memory growth. Hope you are using a 64 bit JVM and not a 32 bit JVM. A 32-bit process can only use 2 GB of memory, so tuning the memory will essentially be useless. Hope this helps

mehmetminanc · Answer 3 · 2015-11-27T20:23:46+0000

JVM parameters are not enough to configure Spark memory, you also need to install spark.driver.memory (for driver, obv.) And spark.executor.memory (for workers). The default value is 1gb. See this detailed guide for more details . In fact, I urge you to read it, there are a lot of things, and acquaintance with it will certainly pay off later.

Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

More articles: