The performance you see comes from the functionality of the Windows debugging heap and its a bit of stealth in how it allows itself, even in releases.
I took the liberty of creating a 64-bit debugging image of a simpler program and found this:
- msvcr110d.dll! _CrtIsValidHeapPointer (const void * pUserData = 0x0000000001a8b540)
- msvcr110d.dll! _free_dbg_nolock (void * pUserData = 0x0000000001a8b540, int nBlockUse = 1)
- msvcr110d.dll! _free_dbg (void * pUserData = 0x0000000001a8b540, int nBlockUse = 1)
- msvcr110d.dll! operator delete (void * pUserData = 0x0000000001a8b540)
Of particular interest to me was the body
msvcr110d.dll!_CrtIsValidHeapPointer , which turns out to be like this:
if (!pUserData) return FALSE; // Note: all this does is checks for null if (!_CrtIsValidPointer(pHdr(pUserData), sizeof(_CrtMemBlockHeader), FALSE)) return FALSE; // but this is expensive return HeapValidate( _crtheap, 0, pHdr(pUserData) );
This call to HeapValidate() is cruel.
Well, maybe I would expect this in a debug build. but of course not to release. As it turned out, this is improving, but look at the call stack:
- ntdll.dll! RtlDebugFreeHeap ()
- ntdll.dll! string "Enabling heap debugging options \ n" ()
- ntdll.dll! RtlFreeHeap ()
- kernel32.dll! HeapFree ()
- msvcr110.dll! free (void * pBlock)
This is interesting, because when I started it first, I join the running process using the IDE (or WinDbg), not allowing it to control the runtime, this column stops at ntdll.dll!RtlFreeHeap() . In other words, running outside of the RtlDebugFreeHeap IDE RtlDebugFreeHeap . But why?
I thought to myself: for some reason, the debugger switches the switch to enable heap debugging. After some digging, I came to the conclusion that the “switch” is the debugger itself. Windows uses special debugging heap functions ( RtlDebugAllocHeap and RtlDebugFreeHeap ) if the executable process is RtlDebugFreeHeap by the debugger. This MSDN man page on WinDbg eludes this, along with other interesting tidbits about debugging under Windows:
from Debugging user mode process using WinDbg
The processes that the debugger creates (also known as spawned processes) behave somewhat differently than processes that the debugger does not create.
Instead of using the standard heap API, the processes that the debugger creates use a special debug heap. You can force the spawned process to use the standard heap instead of the debug heap using the _NO_DEBUG_HEAP environment variable or the -hd command-line option.
Now we get somewhere. To test this, I simply dropped sleep() with the appropriate amount of time so that I could attach a debugger, rather than starting a process with it, and then let it work in its own fun way. Of course, as mentioned earlier, he sailed at full speed forward.
Based on the contents of this article, I took the liberty of updating my builds in Release mode to define _NO_DEBUG_HEAP=1 in the runtime settings of my project files. I'm obviously still interested in granular heap processing in debug builds, so these configurations are left as they are. After that, the overall speed of my release running under VS2012 (and VS2010) was much faster, and I also invite you to try it.