We are launching SonarQube 5.1.2 on AWS node. After a short period of use, usually after a day or two, the Sonar web server stops responding to server processors:
top - 01:59:47 up 2 days, 3:43, 1 user, load average: 1.89, 1.76, 1.11 Tasks: 93 total, 1 running, 92 sleeping, 0 stopped, 0 zombie Cpu(s): 94.5%us, 0.0%sy, 0.0%ni, 5.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 7514056k total, 2828772k used, 4685284k free, 155372k buffers Swap: 0k total, 0k used, 0k free, 872440k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2328 root 20 0 3260m 1.1g 19m S 188.3 15.5 62:51.79 java 11 root 20 0 0 0 0 S 0.3 0.0 0:07.90 events/0 2284 root 20 0 3426m 407m 19m S 0.3 5.5 9:51.04 java 1 root 20 0 19356 1536 1224 S 0.0 0.0 0:00.23 init
A load of 188% of the processor comes from the WebServer process:
$ ps -eF|grep "root *2328" root 2328 2262 2 834562 1162384 0 Mar01 ? 01:06:24 /usr/java/jre1.8.0_25/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.management.enabled=false -Djruby.compile.invokedynamic=false -Xmx768m -XX:MaxPermSize=160m -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/opt/sonar/temp -cp ./lib/common
Initially, we thought that we were operating at a too small node level and recently upgraded to an m3-large instance, but we are seeing the same problem (except that now it uses two processors instead of one).
The only interesting information in the magazine:
2016.03.04 01:52:38 WARN web[oetransport] [sonar-1456875684135] Received response for a request that has timed out, sent [39974ms] ago, timed out [25635ms] ago, action [cluster:monitor/nodes/info], node [[#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]]], id [43817] 2016.03.04 01:53:19 INFO web[oeclient.transport] [sonar-1456875684135] failed to get node info for [#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]], disconnecting... org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[/127.0.0.1:9001]][cluster:monitor/nodes/info] request_id [43817] timed out after [14339ms] at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366) ~[elasticsearch-1.4.4.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_25] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_25] at java.lang.Thread.run(Unknown Source) [na:1.8.0_25]
Does anyone know what can happen here, or have ideas on how to further diagnose this problem?