JVM High Load Freeze for Durability Testing

Work with JVM:

java version "1.7.0_79" Java(TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) 

OS:

 CentOS release 6.4 (Final) 

Jvm options:

 -Xmx4g -Xms4g -XX:MaxPermSize=4g -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintClassHistogram -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+DisableExplicitGC 

Work in the environment of OSGI, Aerospike DB, NETTY (NIO) for work in a network.

Take a weekend longevity test. This was the last print:

 [2015-12-11 09:54:51,185] INFO : [GC pause (young) 

After 2 days, I ran strace on pid, and then these are the following prints:

 [2015-12-11 09:54:51,185] INFO : [GC pause (young) 3598M->1458M(4096M), 0.0280020 secs] [2015-12-13 11:54:54,353] INFO : [GC pause (young) 3598M->1464M(4096M), 180001.5628870 secs] 

The first print completed, and the next print showed a 2-day GC.

jvm did not respond to stream dump signals during freezing (pkill -QUIT pid). This freeze occurs every few days. Freezing occurs not only with the collector G1, but also with the CMS collector. How can I start debugging this and what can it do?

Thanks.

EDIT: If there was another freeze, strace this time does not release the freeze. The second freeze was released using jstack.

UPDATE: Found a problem! Take a look at the answer below.

+5
source share
1 answer

I found a problem!
This is a kernel bug in futex_wait() that was sent back to our kernel version.
You can read about it here:
https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64

+5
source

Source: https://habr.com/ru/post/1238017/


All Articles