Why are performance measurements different?

Question

Why are performance measurements different?

I have a simple method that converts an array from one type to another. I wanted to know which method is the fastest. But so far I get different results, from which I can not conclude which method is really faster, with which margin.

Since the conversion is only related to memory allocation, reading the array and converting the values, I am surprised that the values are no more stable. I wanted to know how I can make accurate measurements that make sense and do not change from one day to another. The differences are approximately 20% from one day to another.

Of course, there are differences between JITer.NET 3.5 and 4.0, debugging and release mode, and not running the executable in the debugger (disables JIT optimization before disabling it), C # compiler code generation between DEBUG and RELEASE (mainly nop operations and more temporary variables in the IL code).

using System; using System.Collections.Generic; using System.Diagnostics; namespace PerfTest { class Program { const int RUNS = 10 * 1000 * 1000; static void Main(string[] args) { int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 }; var s2 = Stopwatch.StartNew(); for (int i = 0; i < RUNS; i++) { float[] arr = Cast(array); } s2.Stop(); GC.Collect(); var s3 = Stopwatch.StartNew(); for (int i = 0; i < RUNS; i++) { float[] arr = Cast2(array); } s3.Stop(); GC.Collect(); var s4 = Stopwatch.StartNew(); for (int i = 0; i < RUNS; i++) { var arr = CastSafe(array); } s4.Stop(); Console.WriteLine("Times: {0} {1} {2}", s2.ElapsedMilliseconds, s3.ElapsedMilliseconds, s4.ElapsedMilliseconds); } // Referece cast implementation to check performance public static unsafe float[] Cast(int[] input) { int N = input.Length; float[] output = new float[N]; fixed (int* pIStart = &input[0]) { int* pI = pIStart; fixed (float* pOStart = &output[0]) { float* pO = pOStart; for (int i = 0; i < N; i++) { *pO = (float)*pI; pI++; pO++; } } } return output; } // Referece cast implementation to check performance public static unsafe float[] Cast2(int[] input) { int N = input.Length; float[] output = new float[N]; fixed (int* pIStart = &input[0]) { int* pI = pIStart; fixed (float* pOStart = &output[0]) { float* pO = pOStart; for (int i = 0; i < N; i++) { pO[i] = (float) pI[i]; } } } return output; } public static float[] CastSafe(int[] input) { int N = input.Length; float[] output = new float[N]; for (int i = 0; i < input.Length; i++) { output[i] = (float)input[i]; } return output; } } }

I get then

Time: 1257 1388 1180
Time: 1331 1428 1267
Time: 1337 1435 1267
Time: 1,208 1,414 1,145

From this it is clear that the option with the mute safe is faster than any unsafe option, although the limitations of checking the elimination of unsafe methods should make it at least quick, if not faster. Just for fun, I also compiled the same IL code via LCG (DynamicMethod), which seems to be even slower than any of these methods, although the additional cost of calling a delegate does not seem to play such a big role here.

The for loop executes this code 10 million times, which should lead to stable results. Why do I even see any differences? Using real-time as a process priority also did not help (psexec-realtime executable). How can I get reliable numbers?

My tests included

Dual-core quad-core machines
Windows 7 32/64-bit
.NET Framework 3.5 / 4.0
32/64 bit version of the executable file.

If I use the profiler, I'm not sure if it will distort the measurements even more. Since he interrupts my application from time to time to get call stacks, he will certainly destroy any place in the cache that may help in performance. If there is any approach with the best (given) cache locales, I cannot find it using the profiler.

Edit1: To take into account that I do not have a real-time operating system, I am now sampling my measurements. Since for one stream I have a 15-minute time window provided by the Windows Scheduler, I can leave the Scheduler if I measure shorter than 15 ms. If I take a measurement that is too short, I will get a very small number of tags that won’t tell me much.

To get stable values, I need a time interval long enough for the OS to do what it does on a regular basis. Empirical tests have shown that 30+ seconds is a good amount of time that a single measurement may require.

This time interval is then divided into sampling time intervals that are significantly lower than 15 ms. Then I will get time information for each sample. From the samples I can extract min / max and average. That way, I can also see the initialization effects for the first time. Now the code is as follows

 class Program { const int RUNS = 100 * 1000 * 1000; // 100 million runs will take about 30s const int RunsPerSample = 100; // 100 runs for on sample is about 0,01ms << 15ms static void Main(string[] args) { int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 }; long[] sampleTimes = new long [RUNS/RunsPerSample]; int sample = 0; for (int i = 0; i < RUNS; i+=RunsPerSample) { var sw = Stopwatch.StartNew(); for (int j = i; j < i+RunsPerSample; j++) { float[] arr = Cast(array); } sw.Stop(); sampleTimes[sample] = sw.ElapsedTicks; sample++; } Console.WriteLine("SampleSize: {0}, Min {1}, Max {2}, Average {3}", RunsPerSample, sampleTimes.Min(), sampleTimes.Max(), sampleTimes.Average());

The values of these tests are still changing (<10%), but I think that if I create a histogram chart of my values and reset the 10% values that are most likely caused by the OS, GC, ... I can really get stable numbers that I can trust.

SampleSize: 100, Min 25, Max 86400, Average 28.614631

SampleSize: 100, Min 24, Max 86027, Average 28,762608
SampleSize: 100, Min 25, Max 49523, Average 32.102037
SampleSize: 100, Min 24, Max 48687, Average 32.030088

Edit2: Histograms show that the measured values are not random. They look like a Landau distribution , which should give me the correct algorithms for approximating stable values. I want something like ROOT in .NET where I can interact with a suitable distribution function with my data and get the results.

Below is the code for generating a histogram using MSChart :

 using System.Collections.Generic; using System.Drawing; using System.Linq; using System.Windows.Forms; using System.Windows.Forms.DataVisualization.Charting; namespace ConsoleApplication4 { public partial class Histogram : Form { public Histogram(long [] sampleTimes) { InitializeComponent(); Series histogramSeries = cHistogram.Series.Add("Histogram"); // Set new series chart type and other attributes histogramSeries.ChartType = SeriesChartType.Column; histogramSeries.BorderColor = Color.Black; histogramSeries.BorderWidth = 1; histogramSeries.BorderDashStyle = ChartDashStyle.Solid; var filtered = RemoveHighValues(sampleTimes, 40); KeyValuePair<long,int>[] histoData = GenerateHistogram(filtered); ChartArea chartArea = cHistogram.ChartAreas[histogramSeries.ChartArea]; chartArea.AxisY.Title = "Frequency"; chartArea.AxisX.Minimum = histoData.Min( x=>x.Key ); chartArea.AxisX.Maximum = histoData.Max( x=>x.Key ); foreach (var v in histoData) { histogramSeries.Points.Add(new DataPoint(v.Key, v.Value)); } chartArea.AxisY.Minimum = 0; chartArea.AxisY.Maximum = histoData[0].Value + 100; } // Count the occurence of each value of input and return an array with the value as key and its count as value // as ordered list starting with the highest counts. KeyValuePair<long,int>[] GenerateHistogram(long [] input) { Dictionary<long, int> counts = new Dictionary<long, int>(); foreach (var value in input) { int old = 0; if (!counts.TryGetValue(value, out old)) { counts[value] = 0; } counts[value] = ++old; } var orderedCounts = (from x in counts orderby x.Value descending select x).ToArray(); return orderedCounts; } long[] RemoveHighValues(long[] input, int maxDifference) { var min = input.Min(); var max = input.Max(); long[] filtered = input; while (max - min > maxDifference) // remove all values wich differ by more than maxDifference ticks { filtered = input.Where(x => x < max).ToArray(); max = filtered.Max(); } return filtered; } } }

+6

performance c # .net

Alois kraus Jul 23 '11 at 22:16

source share

4 answers

You are talking about an average difference of about a hundredth of a nanosecond per method call. Windows does not claim real-time OS status; These measurements are approximately stable as you receive.

And by the way, jitter will eliminate border checks inside your CastSafe method . I would be very surprised if you can find something faster.

(If the processor is the bottleneck, then you can improve performance using Parallel.For rather than a simple for loop, but determine that you need to test real-world data. For example, the behavior of the cache will differ significantly for an array of 43 integers, than for an array of 43,000,000 integers.)

+4

Lukeh Jul 23 '11 at 23:28

source share

I assume this will work with mono under Linux? To avoid the influence of the multitasking environment, you can run any program with

 time program

and get a measurement of how much processor time your program is using.

You also measure the warm-up phase and load time, but if there are enough elements in the loop, this should not be a big problem. Maybe there is an equivalent program on the Windows platform?

0

user unknown Jul 23 '11 at 10:31

source share

Stopwatch is not so accurate, try using HighResClock

http://netcode.ru/dotnet/?lang=&katID=30&skatID=261&artID=7113

don't expect the measurements to be accurate for nano-second, as someone else wrote, Win7 is not an operational OS.

Also, after GC.Collect (), you might want to put GC.WaitForPendingFinalizers ();

0

Gilad Jul 24 '11 at 14:18

source share

Alois kraus · Accepted Answer · 2012-05-09T08:31:53+0000

I tried my initial question with finding that the numbers were not random, but followed by a distribution (looks like a Landau distribution), where I can use suitable algorithms to get the maximum value with the most probable true time.

Why are performance measurements different?

More articles: