A high processor, perhaps due to context switching?

One of our servers is experiencing a very high processor load with our application. We examined various statistics and problems with finding the source of the problem.

One of the current theories is that there are too many threads involved, and we should try to reduce the number of threads running at the same time. There is only one main thread pool, with 3000 threads, and a WorkManager working with it (this is Java EE - Glassfish). At any given time, there are about 620 separate network I / O operations that must be performed in parallel (using java.NIO is also not an option). In addition, there are about 100 operations that do not have an IO, and are also performed in parallel.

This structure is ineffective, and we want to see if it really causes damage, or just bad practice. The reason is that any change in this system is quite expensive (in terms of man-hours), so we need some proof of the problem.

So now we are wondering if switching thread streams is the reason, since there are far more threads than the required parallel operations. Looking at the logs, we see that on average 14 different threads execute in one second. Considering the presence of two processors (see below), these are 7 threads per processor. It doesn't seem like too much, but we wanted to check it out.

So - can we eliminate context switching or too many threads as a problem?

General Information:

  • Java 1.5 (yes, it is old), running on CentOS 5, 64-bit, Linux 2.6.18-128.el5 kernel
  • There is only one Java process on the machine. Nothing.
  • Two processors under VMware.
  • RAM 8 GB
  • We do not have the ability to run the profiler on a machine.
  • We have no way to update Java and OS.

UPDATE As we recommend below, we captured the average load (using uptime) and the CPU (using vmstat 1 120) on our test server with different loads. We waited 15 minutes between each load change and its measurements in order to stabilize the system around the new load and update the average load values:

50% of the work server workload: http://pastebin.com/GE2kGLkk

34% of the work server workload: http://pastebin.com/V2PWq8CG

25% of the working server workload: http://pastebin.com/0pxxK0Fu

CPU usage seems to decrease as the load decreases, but not at a very sharp level (a change from 50% to 25% does not actually reduce CPU usage by 50%). The average load seems incomparable with the amount of workload.

The question also arises: if our test server is also a virtual machine, can its CPU measurements be affected by other virtual machines running on the same host (which makes these measurements useless)?

UPDATE 2 Attaching a snapshot of threads in three parts (pastebine restrictions)

Part 1: http://pastebin.com/DvNzkB5z

Part 2: http://pastebin.com/72sC00rc

Part 3: http://pastebin.com/YTG9hgF5

+4
source share
6 answers

It seems to me that the problem with the 100 threads associated with the processor is more than anything else. 3000 is basically a red herring, since idle threads do not consume much. I / O streams are likely to sleep “most” of the time, since I / O is measured on a geological time scale in terms of computer operations.

You do not indicate what 100 CPU threads do, or how long they last, but if you want to slow down your computer, dedicating 100 threads “until it stops” will certainly do it. Since you have 100 “always ready to run,” the machine will switch context as fast as the scheduler allows. There will be almost zero downtime. Context switching will have consequences because you do it so often. Since CPU threads (most likely) consume most of the processor time, your I / O “connected threads” wait longer in the execution queue than they wait for I / O. Thus, even more processes are expected (I / O processes simply fail more often because they quickly hit the I / O barrier, which makes the process idle for the next).

There is no doubt that there is something to improve efficiency, but 100 CPU threads are 100 CPU threads. Not much you can do there.

+2
source

I think your restrictions are unfounded. Basically you say:

1.I can't change anything 2.I can't measure anything 

Could you tell about my problem?

The real answer to this question is that you need to connect the proper profiler to the application, and you need to match what you see using the CPU, disks / network inputs and memory.

Remember the 80/20 performance tuning rule. 80% depends on the setting of your application. You may have too much workload for one instance of the virtual machine, and it is time to consider solutions for scaling horizontally or vertically, providing more resources to the machine. It can be any of the three billion JVM settings that are not related to your application runtime.

I assume that a pool of 3,000 threads came from the famous more threads = more concurrency = more performance theory. The real answer is that changing the setting does not cost anything unless you measure the throughput and response time before / after the change and compare the results.

+4
source

So - can we eliminate context switching or too many threads as a problem?

I think you're worried about wear. A thread pool with 3,000 threads (700+ simultaneous operations) on a two-component VMware instance certainly seems to be a problem that can cause congestion and context switching problems. Limiting the number of threads can give you a performance boost, although deciding the right number will be difficult and will probably use a lot of trial and error.

we need some evidence of the problem.

I'm not sure the best way to answer, but here are a few ideas:

  • See the average load for the VM OS and JVM. If you see high load values ​​(20+), then this is an indicator that there are too many queues.
  • Is it not possible to simulate a load in a test environment so that you can play with thread pool numbers? If you run a simulated load in a test environment with a pool size of X and then run using X / 2, you should be able to determine the optimal values.
  • Can you compare long loading times per day with lower loading times? Can you draw the number of latency responses during these time periods to see if you can see the tipping point in terms of friction?
  • If you can simulate the load, make sure that you are not just testing the fire hose drink methodology. You need a simulated load that you can climb up and down. Start at 10% and slow down the increase in simulated load when viewing throughput and latency. You should be able to see tipping points when observing bandwidth density or otherwise deviating.
+2
source

If you cannot create a profile, I would recommend taking a stream dump or two and see what your threads do. Your application should not stop there:

+2
source

Usually context switching in threads is very cheaply computational, but when it comes to many threads ... you just can't know. You say that upgrading to Java 1.6 EE is out of the question, but what about some hardware upgrades? This is likely to provide a quick fix and should not be so expensive ...

+1
source

eg. run the profiler on a similar machine.

  • try the newer version of Java 6 or 7. (This may not affect, in which case, do not worry about upgrading production)
  • try Centos 6.x
  • try using VMware.
  • try reducing the number of threads. You have only 8 cores.

You will find a lot of all or none of the above options, but you won’t know until you get a system that you can test with a known / repeatable workload.

0
source

Source: https://habr.com/ru/post/1399407/


All Articles