A statistical method to know when sufficient iterations of a performance test have been performed

I am performing a performance / load testing service. Imagine a test function as follows:

bytesPerSecond = test(filesize: 10MB, concurrency: 5) 

Using this, I will fill out a table of results for different sizes and levels of concurrency. There are other variables, but you get the idea.

The test feature combines concurrency requests and track bandwidth. This speed starts from zero, then bursts and dips, until it eventually stabilizes at the β€œtrue” value.

However, it may take some time to achieve this stability, and there are many input combinations to evaluate.

How can the test function decide when it has done enough fetching? Enough, I suppose, I mean that the result will not change outside the field if testing continues.

I remember how a while ago I read an article about this (from one of the authors of jsperf), which discussed a reliable method, but I can no longer find the article.

One simple method would be to calculate the standard deviation from a sliding window of values. Is there a better approach?

+6
source share
2 answers

IIUC, you describe the classic problem of estimating the confidence interval of an average value with unknown variance . That is, suppose you have n results, x 1 , ..., x n , where each of x i is a sample from some process that you know little about: not the average, not the variance, but not the distribution form. For some required confidence interval, you want n to be large enough to make sure that the true average is within your average.

(Note that under relatively weak conditions, the Central Limit Theorem ensures that the average value of the sample converges to the normal distribution, but you need to apply the variance directly.)

So, in this case, the classic solution to determine if n is large enough is as follows:

  • We start by calculating the average value of m & mu; = & sum; i [x i ] / n. Also calculate the normalized variance of the sample s 2 = & sum; i [(x i - & mu;) 2 ] / (n - 1)

  • Depending on size n:

    • If n> 30, the confidence interval is approximated as & mu; & Plusmn; g <sub> & alpha; / 2 (s / & radic; (n)), where, if necessary, you can find an explanation here on z and & alpha ;.

    • If n <30, the confidence interval is approximated as & mu; & Plusmn; t <sub> & alpha; / 2 (s /? (N)); see again here for an explanation of the value of t, as well as the table.

  • If trust is enough, stop. Otherwise, increase n.

+2
source

Stability means that the rate of change (derivative) is zero or close to zero.

The test feature combines concurrency requests and track bandwidth. This speed starts from zero, then bursts and dips, until it finishes stabilizing at the β€œtrue” value.

I would track your past bandwidth values. For example, the last value of X or so. According to these values, I would calculate the rate of change (derived from your bandwidth). If your derivative is close to zero, your test is stable. I will stop the test.

How to find X? I think that instead of a constant value like 10, choosing the value according to the maximum number of tests might be more appropriate, for example:

  X = max(10,max_test_count * 0.01) 
0
source

Source: https://habr.com/ru/post/989403/


All Articles