Stackoverflow does boxing in c #

I have these two pieces of code in C #:

First

class Program { static Stack<int> S = new Stack<int>(); static int Foo(int n) { if (n == 0) return 0; S.Push(0); S.Push(1); ... S.Push(999); return Foo( n-1 ); } } 

Second

 class Program { static Stack S = new Stack(); static int Foo(int n) { if (n == 0) return 0; S.Push(0); S.Push(1); ... S.Push(999); return Foo( n-1 ); } } 

Both of them do the same:

  • Create a stack (common for <int> for the first example and an object stack for the second).

  • Declare a method that calls itself recursively n times (n> = 0), and at each step, press 1000 integers inside the created stack.

When I run the first example with Foo(30000) , an exception does not occur, but the second example is reset with Foo(1000) , only n = 1000.

When I saw the CIL generated for both cases, the only difference was the box part for each click:

First

 IL_0030: ldsfld class [System]System.Collections.Generic.Stack`1<int32> Test.Program::S IL_0035: ldc.i4 0x3e7 IL_003a: callvirt instance void class [System]System.Collections.Generic.Stack`1<int32>::Push(!0) IL_003f: nop 

Second

 IL_003a: ldsfld class [mscorlib]System.Collections.Stack Test.Program::S IL_003f: ldc.i4 0x3e7 IL_0044: box [mscorlib]System.Int32 IL_0049: callvirt instance void [mscorlib]System.Collections.Stack::Push(object) IL_004e: nop 

My question is: why, if there is no secondary overload of the CIL stack for the second example, does it happen β€œfaster” than the first?

+49
c # stack-overflow il
Mar 04 '15 at 21:17
source share
1 answer

Why, if there is no significant overload of the CIL stack for the second example, does it happen β€œfaster” than the first?

Please note that the number of CIL instructions does not accurately reflect the amount of work or memory that will be used. A single instruction can be very low or very high, so counting CIL instructions is not an accurate way to measure "performance".

Also understand that CIL is not what is being executed. JIT compiles CIL into valid machine instructions with an optimization phase, so CIL can be very different from the actual instructions executed.

In the second case, since you are using a non-shared collection, each Push call requires an integer to be boxed, as you defined in CIL.

Boxing the whole effectively creates an object that wraps Int32 for you. Instead of just loading a 32-bit integer onto the stack, it now needs to load a 32-bit integer onto the stack and then pop it, which also effectively loads the object reference onto the stack.

If you check this in the Disassembly window, you will see that the difference between the generic and non-generic versions is dramatic and much more significant than the proposed CIL.

The general version compiles efficiently as a series of calls like:

 0000022c nop S.Push(25); 0000022d mov ecx,dword ptr ds:[03834978h] 00000233 mov edx,19h 00000238 cmp dword ptr [ecx],ecx 0000023a call 71618DD0 0000023f nop S.Push(26); 00000240 mov ecx,dword ptr ds:[03834978h] 00000246 mov edx,1Ah 0000024b cmp dword ptr [ecx],ecx 0000024d call 71618DD0 00000252 nop S.Push(27); 

Non-scam, on the other hand, should create boxed objects and instead compiles to:

 00000645 nop S.Push(25); 00000646 mov ecx,7326560Ch 0000064b call FAAC20B0 00000650 mov dword ptr [ebp-48h],eax 00000653 mov eax,dword ptr ds:[03AF4978h] 00000658 mov dword ptr [ebp+FFFFFEE8h],eax 0000065e mov eax,dword ptr [ebp-48h] 00000661 mov dword ptr [eax+4],19h 00000668 mov eax,dword ptr [ebp-48h] 0000066b mov dword ptr [ebp+FFFFFEE4h],eax 00000671 mov ecx,dword ptr [ebp+FFFFFEE8h] 00000677 mov edx,dword ptr [ebp+FFFFFEE4h] 0000067d mov eax,dword ptr [ecx] 0000067f mov eax,dword ptr [eax+2Ch] 00000682 call dword ptr [eax+18h] 00000685 nop S.Push(26); 00000686 mov ecx,7326560Ch 0000068b call FAAC20B0 00000690 mov dword ptr [ebp-48h],eax 00000693 mov eax,dword ptr ds:[03AF4978h] 00000698 mov dword ptr [ebp+FFFFFEE0h],eax 0000069e mov eax,dword ptr [ebp-48h] 000006a1 mov dword ptr [eax+4],1Ah 000006a8 mov eax,dword ptr [ebp-48h] 000006ab mov dword ptr [ebp+FFFFFEDCh],eax 000006b1 mov ecx,dword ptr [ebp+FFFFFEE0h] 000006b7 mov edx,dword ptr [ebp+FFFFFEDCh] 000006bd mov eax,dword ptr [ecx] 000006bf mov eax,dword ptr [eax+2Ch] 000006c2 call dword ptr [eax+18h] 000006c5 nop 

Here you can see the meaning of boxing.

In your case, when boxing, an integer causes the linking of box objects to the stack. On my system, this causes stackoverflow for any calls that exceed Foo(127) (32 bits), which means that integers and object references in the box (4 bytes each) are all stored on the stack, since 127 * 1000 * 8 == 1016000, which is dangerously close to the stack size by default thread size 1 for .NET applications.

When using the universal version, since there is no boxed object, integers must not be stored on the stack, and the same register is reused. This allows you to significantly increase (> 40,000 on my system) before using the stack.

Please note that this will be a CLR version and platform dependent, as there is another xIT / x64 JIT.

+59
Mar 04 '15 at 22:05
source share



All Articles