Here's the script:
I run my Java / Spring application on an Amazon EC2 Linux instance in load balancing mode with initially three servers that can scale up or down as needed.
Scale Criteria: When CPU usage exceeds 30% for more than 10 minutes, add 2 more servers.
Scaling Criteria: When CPU usage is reduced to 15% in more than 10 minutes, delete one server.
Download (using blazemeter.com): Increase the number. users constantly from 0 to 50 for 15 minutes and remain constant from there.
Answer:
- In the first 15 minutes, the load increased to 50 beats per second and remained stable for another 5 minutes. CPU usage remains around 30%. Response times below 20 ms at this point.
- While the load was 50 beats per second, about 20 minutes from the start, the processor load increased to about 33% over more than 10 minutes, thereby activating a step up. The response time increases sharply, varying from 5000 to 15000 ms.
- Now with two additional servers (the number of servers is now 5), the processor load is reduced to 20%, but the response time does not show signs of retreat. It still remains between 5,000 ms and 15,000 ms for the remainder of the test period until the load is removed.
My question is: why, in your opinion, did the response time not reach normal (about 20 ms) when the processor load returned to normal operation (about 20% of usage)?
CPU utilization table

Response time chart 
Thank you for your time:)
James source share