Avoiding branches in managed languages

Question

Avoiding branches in managed languages

In C, when compiling to an x86 machine, I usually replace branches with a logical expression when speed is the most important aspect, even if the conditions are complex, for example, instead

char isSomething() { if (complexExpression01) { if (complexExpression02) { if(!complexExpression03) { return 1; } } } return 0; }

I will write:

 char isSomething() { return complexExpression01 && complexExpression02 && !complexExpression03 ; }

Now it’s clear that it can be harder to support less readable code, but in fact it can be faster.

Is there a reason to act the same when working with managed code, such as C #? Are transitions expensive in managed code because they are in unmanaged code (at least on x86)?

+6

c # managed-code

MByD Nov 08 '11 at 7:49

source share

4 answers

Are common

In your regular compiler, the generated code will most often match, at least if you assume that you are using the regular

 csc.exe /optimize+ cl.exe /O2 g++ -O2

and related default optimization modes.

Common mantra: profile, profile, profile (and don't micro-optimize until the profiler tells you). You can always look at the generated code ² to see if there is room for improvement.

Think of it this way, for example. C # code:

FROM#/. NET

Each of your complex expressions is a call to a de facto function call (call, calli, callvirt opcode ³ ), which requires its arguments to be pushed onto the stack. The return value will be left on the stack instead of the parameters on exit.

Now the CLR, which is a stack-based virtual machine (i.e., without registers), is exactly the same as an anonymous temporary variable on the stack. The only difference is the number of identifiers used in the code.

Now, what the JIT engine does is another matter: the JIT engine will have to transfer these calls to its own assembly and can perform optimization by adjusting register allocation, ordering of commands, branch prediction, etc. ¹

<sub>> ¹ (although in practice for this sample it will not be allowed to make more interesting optimizations, because complex function calls can have side effects, and the C # specifications are very clear regarding the evaluation order, and so on). Note , however, that the JIT mechanism allows inline function calls to reduce overhead.

Not only when they are not virtual, but (IIRC) also when the runtime type can be known statically at compile time for certain internal .NET components. I need to look for a link for this, but in fact I think that in the .NET Framework 4.0 there are attributes that clearly prevent the embedding of the framework functions; this means that Microsoft can correct library code in service packs / updates, even if the user assemblies were compiled in advance (ngen.exe) into their own images. Sub>

C / C ++

In C / C ++, the memory model is much weaker (that is, at least until C ++ 11), and the code is usually compiled using the built-in instructions during compilation directly. Add that C / C ++ compilers usually do aggressive insertion, the code even in such compilers will usually be the same if you do not compile without optimization.

<sub> ² I use

monodis or monodis to see generated IL code
mono -aot=full,static or mkbundle to create your own modular objects and objdump -CdS to view instructions for this command with annotation.

Please note that this is purely curiosity, because it rarely happens that I find interesting bottlenecks. However, see J in the Skeet blog posts on Noda.NET performance Noda.NET for good examples of surprises that might be hidden in the generated IL code for common classes.

³ Change is inaccurate for statements on built-in compilers, although even they simply leave their result on the stack. Sub>

+4

sehe Nov 08 '11 at 7:51

source share

It depends on the implementation of the CLR and the managed language compiler. In C #, the following test case proves that there is no difference in instructions for nested if statements and combined if statements:

  // case 1 if (value1 < value2) 00000089 mov eax,dword ptr [ebp-0Ch] 0000008c cmp eax,dword ptr [ebp-10h] 0000008f jge 000000A6 { if (value2 < value3) 00000091 mov eax,dword ptr [ebp-10h] 00000094 cmp eax,dword ptr [ebp-14h] 00000097 jge 000000A6 { result1 = true; 00000099 mov eax,1 0000009e and eax,0FFh 000000a3 mov dword ptr [ebp-4],eax } } // case 2 if (value1 < value2 && value2 < value3) 000000a6 mov eax,dword ptr [ebp-0Ch] 000000a9 cmp eax,dword ptr [ebp-10h] 000000ac jge 000000C3 000000ae mov eax,dword ptr [ebp-10h] 000000b1 cmp eax,dword ptr [ebp-14h] 000000b4 jge 000000C3 { result2 = true; 000000b6 mov eax,1 000000bb and eax,0FFh 000000c0 mov dword ptr [ebp-8],eax }

+2

Polity Nov 08 '11 at 8:12

source share

The only way to know is to measure.

True and false are represented by the CLR as 1 and 0, so I would not be surprised if the use of logical expressions had some advantage. Let's get a look:

 static void BenchBranch() { Stopwatch sw = new Stopwatch(); const int NMAX = 1000000000; bool a = true; bool b = false; bool c = true; sw.Restart(); int sum = 0; for (int i = 0; i < NMAX; i++) { if (a) if (b) if (c) sum++; a = !a; b = a ^ b; c = b; } sw.Stop(); Console.WriteLine("1: {0:F3} ms ({1})", sw.Elapsed.TotalMilliseconds, sum); sw.Restart(); sum = 0; for (int i = 0; i < NMAX; i++) { if (a && b && c) sum++; a = !a; b = a ^ b; c = b; } sw.Stop(); Console.WriteLine("2: {0:F3} ms ({1})", sw.Elapsed.TotalMilliseconds, sum); sw.Restart(); sum = 0; for (int i = 0; i < NMAX; i++) { sum += (a && b && c) ? 1 : 0; a = !a; b = a ^ b; c = b; } sw.Stop(); Console.WriteLine("3: {0:F3} ms ({1})", sw.Elapsed.TotalMilliseconds, sum); }

Result:

 1: 2713.396 ms (250000000) 2: 2477.912 ms (250000000) 3: 2324.916 ms (250000000)

So from this, there seems to be a slight advantage in using logical operators instead of nested conditional statements. However, any particular instance may give slightly different results.

After all, whether micro-optimization like this is worth it depends on how highly critical the code is.

0

Jeffrey sax Nov 08 '11 at 8:13

source share

Ulrik Rasmussen · Accepted Answer · 2011-11-08T07:59:57+0000

Two expressions will lead to the same number of tests, since the logical and operator ( && ) have short circuit semantics in both C and C #. Therefore, the premise of your question (that the second way of expressing a program leads to less branching) is wrong.

Avoiding branches in managed languages

Are common

FROM#/. NET

C / C ++

More articles: