<= is slower than <

Question

<= is slower than <

I found several questions regarding SO regarding comparing the performance of <and <= (this one was extremely understated), and I always found the same answer that there is no difference in performance between them.

I wrote a program for comparison ( not working violin ... copy to your computer to run it ), in which I created two loops for (int i = 0; i <= 1000000000; i++ ) and for (int i = 0; i < 1000000001; i++ ) two different ways.

I ran each method 100 times; took the average time and found that the cycle with the operator <= is slower than the operator with < .

I ran the program several times, and <= always took more time. My results (im ms):

3018.73, 2778.22

2816.87, 2760.62

2859.02, 2797.05

My question is: If none are faster, why do I see differences in the results? Is there something wrong with my program?

+6

performance c #

Dumbledore Apr 17 '15 at 14:42

source share

2 answers

First of all, there are many, many reasons to see variations in tests, even when they are done correctly. Here are some of them that come to mind:

Many other processes are simultaneously running on your computer, switching to and from context, etc. The operating system constantly receives and processes interrupts from various input / output devices, etc. All of this can cause the computer to pause for periods of time that overshadow the runtime for the current code that you are testing.
The JIT process can determine when a function is run a certain number of times, and apply additional optimizations to it based on this information. Things like loop reversal can significantly reduce the number of transitions that a program needs to perform, which is significantly more expensive than regular CPU operations. Re-optimizing instructions takes time when this happens for the first time, and then speeds up after that point.
Your equipment is trying to make additional optimizations, such as branch prediction, to ensure the most efficient use of its conveyor. (If he guesses correctly, he can basically pretend that he will do i++ while he waits to check if the comparison < or <= ends, and then discard the result if he finds that it was wrong.) Impact These optimizations depend on many factors, and it’s not easy to predict.

Secondly, it’s actually really difficult to benchmark. Here is a test pattern that I used for a while. This is not ideal, but it is pretty good to ensure that any patterns that arise are unlikely to be based on execution order or random chances:

 /* This is a benchmarking template I use in LINQPad when I want to do a * quick performance test. Just give it a couple of actions to test and * it will give you a pretty good idea of how long they take compared * to one another. It not perfect: You can expect a 3% error margin * under ideal circumstances. But if you're not going to improve * performance by more than 3%, you probably don't care anyway.*/ void Main() { // Enter setup code here var actions = new[] { new TimedAction("control", () => { int i = 0; }), new TimedAction("<", () => { for (int i = 0; i < 1000001; i++) {} }), new TimedAction("<=", () => { for (int i = 0; i <= 1000000; i++) {} }), new TimedAction(">", () => { for (int i = 1000001; i > 0; i--) {} }), new TimedAction(">=", () => { for (int i = 1000000; i >= 0; i--) {} }) }; const int TimesToRun = 10000; // Tweak this as necessary TimeActions(TimesToRun, actions); } #region timer helper methods // Define other methods and classes here public void TimeActions(int iterations, params TimedAction[] actions) { Stopwatch s = new Stopwatch(); int length = actions.Length; var results = new ActionResult[actions.Length]; // Perform the actions in their initial order. for(int i = 0; i < length; i++) { var action = actions[i]; var result = results[i] = new ActionResult{Message = action.Message}; // Do a dry run to get things ramped up/cached result.DryRun1 = s.Time(action.Action, 10); result.FullRun1 = s.Time(action.Action, iterations); } // Perform the actions in reverse order. for(int i = length - 1; i >= 0; i--) { var action = actions[i]; var result = results[i]; // Do a dry run to get things ramped up/cached result.DryRun2 = s.Time(action.Action, 10); result.FullRun2 = s.Time(action.Action, iterations); } results.Dump(); } public class ActionResult { public string Message {get;set;} public double DryRun1 {get;set;} public double DryRun2 {get;set;} public double FullRun1 {get;set;} public double FullRun2 {get;set;} } public class TimedAction { public TimedAction(string message, Action action) { Message = message; Action = action; } public string Message {get;private set;} public Action Action {get;private set;} } public static class StopwatchExtensions { public static double Time(this Stopwatch sw, Action action, int iterations) { sw.Restart(); for (int i = 0; i < iterations; i++) { action(); } sw.Stop(); return sw.Elapsed.TotalMilliseconds; } } #endregion

Here is the result that I get when doing this in LINQPad:

So, you will notice that there are some options, especially in the early stages, but after all the time back and forth was enough back and forth, there is no clear picture showing that one way is much faster or slower than the other.

+4

Stripling warrior Apr 17 '15 at 15:42

source share

Hans passant · Accepted Answer · 2015-04-17T15:40:59+0000

Desktop marking is a fine art. What you are describing is physically impossible, the <= and <statements simply generate different processor instructions that execute at the same speed. I changed your program a bit by doing DoIt ten times and dropping two zeros from the for () loop, so I would not have to wait forever:

x86 jitter:

 Less Than Equal To Method Time Elapsed: 0.5 Less Than Method Time Elapsed: 0.42 Less Than Equal To Method Time Elapsed: 0.36 Less Than Method Time Elapsed: 0.46 Less Than Equal To Method Time Elapsed: 0.4 Less Than Method Time Elapsed: 0.34 Less Than Equal To Method Time Elapsed: 0.33 Less Than Method Time Elapsed: 0.35 Less Than Equal To Method Time Elapsed: 0.35 Less Than Method Time Elapsed: 0.32 Less Than Equal To Method Time Elapsed: 0.32 Less Than Method Time Elapsed: 0.32 Less Than Equal To Method Time Elapsed: 0.34 Less Than Method Time Elapsed: 0.32 Less Than Equal To Method Time Elapsed: 0.32 Less Than Method Time Elapsed: 0.31 Less Than Equal To Method Time Elapsed: 0.34 Less Than Method Time Elapsed: 0.32 Less Than Equal To Method Time Elapsed: 0.31 Less Than Method Time Elapsed: 0.32

x64 jitter:

 Less Than Equal To Method Time Elapsed: 0.44 Less Than Method Time Elapsed: 0.4 Less Than Equal To Method Time Elapsed: 0.44 Less Than Method Time Elapsed: 0.45 Less Than Equal To Method Time Elapsed: 0.36 Less Than Method Time Elapsed: 0.35 Less Than Equal To Method Time Elapsed: 0.38 Less Than Method Time Elapsed: 0.34 Less Than Equal To Method Time Elapsed: 0.33 Less Than Method Time Elapsed: 0.34 Less Than Equal To Method Time Elapsed: 0.34 Less Than Method Time Elapsed: 0.32 Less Than Equal To Method Time Elapsed: 0.32 Less Than Method Time Elapsed: 0.35 Less Than Equal To Method Time Elapsed: 0.32 Less Than Method Time Elapsed: 0.42 Less Than Equal To Method Time Elapsed: 0.32 Less Than Method Time Elapsed: 0.31 Less Than Equal To Method Time Elapsed: 0.32 Less Than Method Time Elapsed: 0.35

The only real signal you get from this is the slow execution of the first DoIt (), also visible in the test results, which leads to wear. And the most important signal is noisy . The average value for both loops is approximately equal, the standard deviation is quite large.

Otherwise, the type of signal that you always get during micro-optimization, code execution is not very deterministic. Reducing .NET overhead, which is usually easy to fix, your program is not the only one that runs on your computer. It should split the processor, only the WriteLine () call is already affected. Run by the conhost.exe process, run simultaneously with your test, while your test code has entered the following for () loop. And everything else that happens on your computer, kernel code and interrupt handlers also get a turn.

And codegen can play a role, one thing you have to do, for example, is simply exchange two calls. The processor itself as a whole executes the code in a very non-deterministic way. The state of processor caches and the amount of historical data collected by branch prediction logic are important.

When I test, I find that a difference of 15% or less is not statistically significant. The hunt for differences is less than quite difficult, you need to carefully study the machine code. Stupid things like a branch target, misaligned, or a variable that is not stored in the processor register can cause big effects at runtime. Not that you can ever fix it; jitter doesn't have enough knobs to adjust.

<= is slower than <

More articles: