How to optimize JVM and GC through load testing

Change From several extremely generous and useful answers, this question has already been received, it is obvious to me that I did not make the important part of this question clear when I asked it earlier in the morning. The answers I have received so far are about optimizing applications and eliminating bottlenecks at the code level. I know this is more important than trying to get an extra 3 or 5% of your JVM!

This question suggests that we have already done everything we could to optimize our application architecture at the code level. Now we want more, and the next place is at the JVM level and garbage collection; I changed the name of the question accordingly. Thanks again!


We have the basic architecture of the pipeline style, where messages are transferred from one component to another, and each component performs different processes at each stage of the path.

Components live inside WAR files deployed to Tomcat servers. In total, we have about 20 components in the pipeline living on 5 different Tomcat servers (I did not choose the architecture or WAR distribution for each server). We use Apache Camel to create all the routes between the components, effectively forming the “connective tissue” of the pipeline.

I was asked to optimize the GC and overall performance of each server running the JVM (5 in total). I spent several days reading on the GC and tuning performance, and I have a good pen on what each of the different JVM options do, how the heap is organized and how most of the parameters affect the overall performance of the JVM.

My thinking is that the best way to optimize each JVM is not to optimize it as standalone. I “feel” (that as far as I can justify it!), Trying to optimize each JVM locally, not considering how it will interact with other JVMs on other servers (both up and down), will not create a globally optimized solution.

It makes sense for me to optimize the entire pipeline as a whole. Therefore, my first question is: I agree, and if not, why?

To do this, I thought about creating a LoadTester that will generate input and feed it to the first endpoint in the pipeline. This LoadTester can also have a separate “monitor stream” that will check the last endpoint for bandwidth. Then I could do all kinds of processing, where we check the average end-to-end message passing time, maximum throughput before failure, etc.

LoadTester will generate the same message input pattern over and over again. The variable in this experiment will be the JVM parameter passed to each Tomcat server startup parameter. I have a list of about 20 different options that I would like to pass to the JVM, and decided that I can just change their values ​​until I find an almost optimal performance.

This may not be the best way to do this, but it is the best way with which I could design, from what time I was given this project (about a week).

Second question: what does this setting think? How would SO create an “optimizing solution” differently?

Last but not least, I am curious what indicators I could use as a basis for measurement and comparison. I really can only think:

  • Find the JVM parameter configuration that provides the fastest average message transit time for messages.
  • Find the JVM settings configuration that provides maximum volume throughput without failures on any of the servers.

Any others? Any reasons why these 2 are bad?

After watching the play, I could see how this could be interpreted as a monolithic question, but in fact I ask how SO optimizes the pipeline JVM, and feel free to cut my solution as you like.

Thanks in advance!

+6
source share
3 answers

Let me rise to the level and say that I did something similar in the big C application many years ago. It consisted of several processes that exchanged messages over related hardware. I came up with a two-step approach.

Step 1. In each process, I used this technique to get rid of any wasteful actions. This took several days to sample, review the code, and repeat. The idea is that there is a chain, and the first thing to do is remove the inefficiency from the links.

Step 2. This part is time consuming, but effective: create message logs with a time trace. Combine them together in a common timeline. Look closely at specific message sequences. What you are looking for is

  • Was the message necessary, or was it a retransmission as a result of a timeout or other possible reason?
  • When was the message sent, received and received? If there is a significant delay between receipt and reality, what is the reason for this delay? Is it just a “queue” question for another process that did I / O, for example? Could it be fixed with various process priorities?

This event took about one day to create the magazines, combine them, find the possibility of speeding up and revise the code. At the same time, after about 10 working days, I found / fixed a number of problems and significantly improved speed.

What is common in these two steps, I do not measure or try to get "statistics". If something spends too much time, this fact reveals this to an amazing programmer, having carefully studied what is happening.

+1
source

I would start by looking for the optimal recommended jvm values ​​specified for your hardware / software package. Or just start with what already exists.

Next, I would make sure that I have monitoring to measure business throughput and SLA

I would not try to configure only GC if there is no reason.

First you will need to figure out what are the main bottlenecks in your application. If it's an I / O binding, SQL binding, etc.

The key point here is MEASURE, IDENTIFY TOP bottlenecks, fix them and do another iteration with a repeated load.

NTN ...

0
source

The biggest trick I know when running multiple JVMs on the same computer limits the number of cores that the GC will use. Otherwise, what can happen when one JVM runs a full GC, it will try to capture each core, affecting the performance of all JVMs, even if they do not run a GC. One suggestion is to limit the number of gc threads to 5/8 or less. (I can’t remember where it is written)


I think you should test the system as a whole to ensure realistic interaction between services. However, I would suggest that you might need to configure each service differently.

Changing the command line options is useful if you cannot change the code. However, if you profile and optimize the code, you can significantly change it than adjust the GC settings (in which you must change them again).

For this reason, I would only change the command line options as a last resort, after a little improvement that can be made in the application code.

0
source

Source: https://habr.com/ru/post/905210/


All Articles