Spark applicaition - Java.lang.OutOfMemoryError: Java heap space

I use a single Spark Standalone machine with 128 GB memory and 32 cores. Below are the settings that I think are relevant to my problem:

spark.storage.memoryFraction 0.35 spark.default.parallelism 50 spark.sql.shuffle.partitions 50 

I have a Spark app that has a loop for 1000 devices. With each cycle (device), he prepares a vector of functions, and then calls the k-means MLLib. At the 25-30th iteration of the loop (processing from the 25th to the 30th device), it starts in the error "Java.lang.OutOfMemoryError: Java heap space".

I tried memoryFraction from 0.7 to 0.35, but that did not help. I also tried parallelism / partition up to 200 with no luck. The JVM option is "-Xms25G -Xmx25G -XX: MaxPermSize = 512m". The size of my data is only about 2G.

Here is the stack trace:

 java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1841) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:138) at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:136) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashTable$class.serializeTo(HashTable.scala:125) at scala.collection.mutable.HashMap.serializeTo(HashMap.scala:40) at scala.collection.mutable.HashMap.writeObject(HashMap.scala:136) at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 

At the beginning, the application looks great, but after it has been working for some time and processing more and more devices, the Java heap takes up gradually and the JVM does not free memory. How to diagnose and fix such a problem?

+5
source share
3 answers

In addition to the memory of Driver and Executor, it is proposed to try the following options: -

Also, it would be nice if you could post the code.

+3
source

You can always use profiler tools like visualVM . to monitor memory growth. Hope you are using a 64 bit JVM and not a 32 bit JVM. A 32-bit process can only use 2 GB of memory, so tuning the memory will essentially be useless. Hope this helps

+1
source

JVM parameters are not enough to configure Spark memory, you also need to install spark.driver.memory (for driver, obv.) And spark.executor.memory (for workers). The default value is 1gb. See this detailed guide for more details . In fact, I urge you to read it, there are a lot of things, and acquaintance with it will certainly pay off later.

0
source

Source: https://habr.com/ru/post/1236983/


All Articles