Replication of load-related failures in non-production environments

Question

Replication of load-related failures in non-production environments

We run our own application on our intranet, and we found a problem after updating it recently, when IIS hangs with 100% CPU usage, requiring a reset.

Instead of freezing users, we reverted to the previous release while we define the solution. The first step is to reproduce the problem, but we cannot.

Here is some background:

Prod has one virtual (vmware) web server with two processors and 2 GB of RAM. The database server has 4 GB and 2 CPUs. It is also on VMWare, but separate physical equipment.

Under normal use, the application works fine. The w3wp.exe process typically uses betwen 5-20% CPU and about 200 MB of RAM. CPU and RAM change a bit under normal use, but nothing unusual.

However, when we start to encounter problems, RAM rises sharply, and the processor is tied at 98% (or as much as it can get). The site becomes unresponsive, which requires a restart of IIS. Resetting the application pool does nothing in this situation; a full restart of IIS is required.

This does not happen at night (without use). This happens more often when the site is under load, but it also occurred during off-peak periods.

The first step to solving this problem is reproduction. To simulate the load, we start using JMeter to simulate usage. Our script load is based on actual usage during the crash. Using JMeter, we can increase the frequency of use (2-3 times the load during the crash), but the site behaves perfectly. The CPU is high and the site becomes sluggish, but memory usage is reasonable and nothing hangs.

Does anyone have any tips on how to reproduce such a problem in a non-production environment? We really would like to reproduce the error, determine the solution, and then check again to make sure that we have resolved it. During the process, we discovered a number of small things that we improved that could solve the problem, but I really would feel much more confident if we could reproduce the problem and test the improved version.

Any tools, techniques or theories are greatly appreciated!

+4

performance memory asp.net cpu crash

Darren Aug 13 '08 at 6:04

source share

3 answers

Curt hagenlocher · Answer 1 · 2008-08-13T06:13:21+0000

You can find some information on fixing this problem in this blog post . Her blog is usually a good debugging resource.

Jeff atwood · Answer 2 · 2008-08-13T06:14:11+0000

I have an article about debugging ASP.NET in production , which may contain some pointers.

littlegeek · Answer 3 · 2008-08-13T07:26:33+0000

Is your env test the same as live? those. 2 separate vm instances on two physical servers - with a network connection and account types?

Are there any other instances in the database?

Do IIS have other web applications?

Is .Net Config Right?

Is the application pool configuration correct for service accounts? Try a look at it - MS Article on II6 Optmising for Performance

Lots of tricks.

Replication of load-related failures in non-production environments

More articles: