"GC upper limit exceeded" on Hadoop.20 datanode

I searched and did not find much information related to the Hadoop Datanode processes that died because the GC upper limit was exceeded, so I thought I'd post a question.

We run a test where we need to confirm that our Hadoop cluster can process files ~ 3 million in size that are stored on it (currently a 4 node cluster). We are using a 64-bit JVM and we have allocated 8g for namenode. However, since my test program writes more files to DFS, datanodes begin to die with this error: Exception in the stream "DataNode: [/ var / hadoop / data / hadoop / data]" java.lang.OutOfMemoryError: GC upper limit exceeded

I saw several messages about some parameters (parallel to GC?) I think it can be installed in hasoop-env.sh, but I'm not too sure of the syntax, and I'm kind of new, so I didn’t know, I don’t know how to do this . Thanks for any help here!

+6
source share
4 answers

Try increasing the memory for the datanode using this: (hadoop restart required for this)

export HADOOP_DATANODE_OPTS="-Xmx10g" 

This will set the heap to 10gb ... you can increase according to your needs.

You can also insert this at the beginning in the $HADOOP_CONF_DIR/hadoop-env.sh .

+7
source

If you start the map reduction job from the command line, you can increase the heap using the -D 'mapreduce.map.java.opts=-Xmx1024m' and / or -D 'mapreduce.reduce.java.opts = -Xmx1024m' option. Example:

 hadoop --config /etc/hadoop/conf jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar --conf /etc/hbase/conf/hbase-site.xml -D 'mapreduce.map.java.opts=-Xmx1024m' --hbase-indexer-file $HOME/morphline-hbase-mapper.xml --zk-host 127.0.0.1/solr --collection hbase-collection1 --go-live --log4j /home/cloudera/morphlines/log4j.properties 

Note that in some Cloudera docs, they still use the old parameters mapred.child.java.opts , mapred.map.child.java.opts and mapred.reduce.child.java.opts . These options no longer work for Hadoop 2 (see What is the relationship between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN? ).

0
source

This issue is resolved for me. Exceeding Hadoop "Exceeding GC Upper Limit

So, the key is “Prepare this environment variable” (I saw this Linux syntax for the first time :))

HADOOP_CLIENT_OPTS = "- Xmx10g" hasoop jar "your.jar" "source.dir" "target.dir"

0
source

An upper GC limit indicates that your (tiny) heap is full.

This is what often happens in MapReduce operations when you process a lot of data. Try the following:

<property>

<name> mapred.child.java.opts </name>

<value> -Xmx1024m -XX: -UseGCOverheadLimit </value>

</ property>

Also try the following things:

Use combinators; reducers should not receive lists longer than small multiples of the number of cards

At the same time, you can generate heap dumps from OOME and parse using YourKit etc. and analyze it

-3
source

Source: https://habr.com/ru/post/912950/


All Articles