Debugging 100% JBoss CPU Usage

Originally posted in response to a server error , where it was suggested to ask this question here.

We use JBoss to run our two WARs. One is our web application, the other is our web service. The web application accesses the database on another machine and makes requests to the web service. The web service makes JMS requests to other machines, aggregates the data, and returns it.

At our largest client, about once a month, the JBoss Java process takes up 100% of all processors. There are 8 processors on the JBoss machine. Our web application is still available during this time, but it takes about 3 minutes to load the pages. Restarting JBoss restores everything to its normal level.

The database machine and all the other machines are beautiful, only the JBoss-powered machine affects it. Memory usage is normal. Network usage is normal. There are no error messages in JBoss logs.

I installed the test environment as close as possible to the working environment of the client, and I conducted load testing with a whole number of concurrent users. I did not get my test environment to replicate the problem.

Where do we go next? How can we narrow the problem?

Currently, the only thing we have is to wait until a problem arises in the production process on its own, and then do some debugging to determine the cause. So far, people have just restarted JBoss when a problem occurs to minimize downtime. The next time this happens, they will interest the developer. The question is, when this happens, what can be done to determine the cause?

We could configure a separate instance of JBoss in the same field and install the web application separately from the web service. Thus, when the following problem arises, we will find out what the WAR problem is (provided that this is our code). This does not narrow it, though.

Should I enable remote JMX? Thus, the next time a problem occurs, I can connect to VisualVM and see which threads the processor takes and what the hell they are doing. However, is there a significant side to enabling remote JMX in a production environment?

Is there any other way to see which threads there is a processor and get a stack to see what they do?

Any other ideas?

Thanks!

+4
source share
4 answers

I think you should definitely try to set up a test environment with some workloads to reproduce your problem. Profiling will definitely help to identify the problem.

A quick fix will next time kill jboss with kill -3 to get a dump for analysis. The second thing I would like to check is that you use the -server flags and that your gc settings are normal. You can also run some dstat to find out what the process does during blocking. But then again, it's probably safer to just set up a load testing environment (via EC2 or so) to reproduce this.

+2
source

There's a quick and dirty way to determine which threads use processor time on JBoss. Go to the JMX console with a browser (usually at http: // localhost: 8080 / jmx-console , but may be different for you), find a bean called ServerInfo , it has an operation called listThreadCpuUtilization that resets the actual processor time used by each active stream, in a good table format. If there is one abnormal behavior, it usually stands out like a sore finger.

There is also a listThreadDump operation that listThreadDump stack for each thread into the browser.

Not as good as a profiler, but it’s much easier to get basic information. For production servers, where there is often bad news for connecting a profiler, this is very convenient.

+7
source

This usually happens with runaway code or an insecure hashmaps access flow. A simple thread dump (kill -3, as @disown says, or ctrl-break in the Windows console) will show this problem.

Since you cannot reproduce it with tests, I think it smells like a concurrency problem; it is usually difficult to get test scripts to behave randomly enough to catch this type of problem.

I usually try to make a standard operating procedure for dumping any JVMs that restart due to operational anomalies, and it really is a requirement to catch these events once a month.

+3
source

If you are using JBoss 5.1.0 EAP, there is an error in Jboss, and they also have a fix. Here is the URL: https://issues.jboss.org/browse/JBPAPP-5193

+1
source

Source: https://habr.com/ru/post/1304190/


All Articles