"GC upper limit exceeded" on Hadoop.20 datanode

Question

"GC upper limit exceeded" on Hadoop.20 datanode

I searched and did not find much information related to the Hadoop Datanode processes that died because the GC upper limit was exceeded, so I thought I'd post a question.

We run a test where we need to confirm that our Hadoop cluster can process files ~ 3 million in size that are stored on it (currently a 4 node cluster). We are using a 64-bit JVM and we have allocated 8g for namenode. However, since my test program writes more files to DFS, datanodes begin to die with this error: Exception in the stream "DataNode: [/ var / hadoop / data / hadoop / data]" java.lang.OutOfMemoryError: GC upper limit exceeded

I saw several messages about some parameters (parallel to GC?) I think it can be installed in hasoop-env.sh, but I'm not too sure of the syntax, and I'm kind of new, so I didn’t know, I don’t know how to do this . Thanks for any help here!

+6

garbage-collection hadoop

hatrickpatrick Apr 11 '12 at 15:56

source share

4 answers

If you start the map reduction job from the command line, you can increase the heap using the -D 'mapreduce.map.java.opts=-Xmx1024m' and / or -D 'mapreduce.reduce.java.opts = -Xmx1024m' option. Example:

 hadoop --config /etc/hadoop/conf jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar --conf /etc/hbase/conf/hbase-site.xml -D 'mapreduce.map.java.opts=-Xmx1024m' --hbase-indexer-file $HOME/morphline-hbase-mapper.xml --zk-host 127.0.0.1/solr --collection hbase-collection1 --go-live --log4j /home/cloudera/morphlines/log4j.properties

Note that in some Cloudera docs, they still use the old parameters mapred.child.java.opts , mapred.map.child.java.opts and mapred.reduce.child.java.opts . These options no longer work for Hadoop 2 (see What is the relationship between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN? ).

0

stefan.m Jan 10 '17 at 14:07

source share

This issue is resolved for me. Exceeding Hadoop "Exceeding GC Upper Limit

So, the key is “Prepare this environment variable” (I saw this Linux syntax for the first time :))

HADOOP_CLIENT_OPTS = "- Xmx10g" hasoop jar "your.jar" "source.dir" "target.dir"

0

Khalid Mammadov Dec 24 '17 at 2:00

source share

An upper GC limit indicates that your (tiny) heap is full.

This is what often happens in MapReduce operations when you process a lot of data. Try the following:

<name> mapred.child.java.opts </name>

<value> -Xmx1024m -XX: -UseGCOverheadLimit </value>

</ property>

Also try the following things:

Use combinators; reducers should not receive lists longer than small multiples of the number of cards

At the same time, you can generate heap dumps from OOME and parse using YourKit etc. and analyze it

-3

shiva kumar s Apr 11 '12 at 19:09

source share

Tejas patil · Accepted Answer · 2012-04-11T20:09:17+0000

Try increasing the memory for the datanode using this: (hadoop restart required for this)

export HADOOP_DATANODE_OPTS="-Xmx10g"

This will set the heap to 10gb ... you can increase according to your needs.

You can also insert this at the beginning in the $HADOOP_CONF_DIR/hadoop-env.sh .

"GC upper limit exceeded" on Hadoop.20 datanode

More articles: