Managed Heap Debugging

I have an interesting problem when I have two dumps of two processes that show managed heap damage. I am using clr.dll 4.0.30319.1008 (RTMGDR.030319-1000) on x64 on Windows 7 x64. With VerifyHeap, I know that I have corruption:

0:016> !VerifyHeap object 000000000367ec60: bad member 0000000004fba740 at 000000000367ec78 curr_object: 000000000528CF90 Last good object: 000000000367ec40 

The object is an array with two elements.

 0:016> !DumpObj /d 000000000367ec60 Name: System.Object[] MethodTable: 000007feedf6adf8 EEClass: 000007feedaefc68 Size: 48(0x30) bytes Array: Rank 1, Number of elements 2, Type CLASS (Print Array) Element Type:System.Object Fields: None 0:016> !DumpArray /d 000000000367ec60 Name: System.Object[] MethodTable: 000007feedf6adf8 EEClass: 000007feedaefc68 Size: 48(0x30) bytes Array: Rank 1, Number of elements 2, Type CLASS Element Methodtable: 000007feedf65a48 [0] 0000000004fba740 [1] 000000000367ec90 

The first pointer is a damaged value, which points to a bool value with a value of 1, which is not a managed object. It is for this reason that the GC is coping.

 0:016> db 0000000004fba740-10 00000000`04fba730 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00000000`04fba740 **01 00** 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00000000`04fba750 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00000000`04fba760 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00000000`04fba770 00 00 00 00 00 00 00 00-b8 1b f7 ed fe 07 00 00 ................ 00000000`04fba780 d0 a7 fb 04 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00000000`04fba790 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 00000000`04fba7a0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 0:016> !lno 04fba740 Before: 0000000004fba718 System.Collections.Hashtable+bucket[] After: 0000000004fba778 System.Collections.Hashtable Heap local consistency confirmed. 

The objects around do not matter, because depending on the dump, they change randomly.

 !GCRoot 0000000367ec60 Scan Thread 16 OSTHread 5fd0 r10:Root: 000000000367ec60(System.Object[]) Scan Thread 17 OSTHread 10cc RSP:1de4cd58:Root: 000000000367ec60(System.Object[]) 

The array itself is not embedded, which indicates its collection. Interestingly, the second object in the array is ThreadLocal data from an already passed thread. This is similar to the fact that the CLR stores ThreadLocal objects in an array of objects for each thread that can be assembled when it exits. Line number 17 executes the actual collection that throws an ExecutionEngineException. But Thread 16 seems to support Thread Local data in an array that needs to be pinned (this is not the case), to which it should not have access.

Thread nr 16 seems to contain TLS data from a stream that has already passed, and possibly write to it.

 OS Thread Id: 0x5fd0 (16) Child SP IP Call Site 000000001dffdfe8 0000000076eb135a [NDirectMethodFrameStandalone: 000000001dffdfe8] MS.Win32.UnsafeNativeMethods.MsgWaitForMultipleObjects(Int32, IntPtr[], Boolean, Int32, Int32) 000000001dffdfa0 000007fecfa7e1bd DomainBoundILStubClass.IL_STUB_PInvoke(Int32, IntPtr[], Boolean, Int32, Int32)*** WARNING: Unable to verify checksum for UIAutomationClientsideProviders.ni.dll 000000001dffe090 000007fecfa7b28d MS.Internal.AutomationProxies.Misc.MsgWaitForMultipleObjects(Microsoft.Win32.SafeHandles.SafeWaitHandle, Boolean, Int32, Int32) 000000001dffe110 000007fecfab5cdd MS.Internal.AutomationProxies.QueueProcessor.WaitForWork() 000000001dffe1b0 000007feede22f78 System.Threading.ExecutionContext.runTryCode(System.Object)*** WARNING: Unable to verify checksum for mscorlib.ni.dll 000000001dffe8d8 000007fef08044c4 [HelperMethodFrame_PROTECTOBJ: 000000001dffe8d8] System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode, CleanupCode, System.Object) 000000001dffea00 000007feede11661 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) 000000001dffea60 000007feede115ab System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 000000001dffeab0 000007feedea6d8d System.Threading.ThreadHelper.ThreadStart() 000000001dffef08 000007fef08044c4 [GCFrame: 000000001dffef08] 000000001dfff2f0 000007fef08044c4 [DebuggerU2MCatchHandlerFrame: 000000001dfff2f0] 

Here is the stop GC collects:

 0:017> !DumpStack OS Thread Id: 0x10cc (17) Current frame: clr!WKS::gc_heap::mark_object_simple+0x75 Child-SP RetAddr Caller, Callee 000000001de4cce0 000007fef0877fb2 clr!WKS::gc_heap::mark_through_cards_for_segments+0x36b 000000001de4ce50 000007fef0873980 clr!WKS::gc_heap::mark_phase+0x160, calling clr!WKS::gc_heap::mark_through_cards_for_segments 000000001de4ce80 000007fef086fce7 clr!EEJitManager::CleanupCodeHeaps+0x57, calling clr!CrstBase::Leave 000000001de4cea0 000007fef07e3dc1 clr!CrstBase::Leave+0x31, calling clr!GetThread 000000001de4ced0 000007fef0873f3d clr!WKS::gc_heap::gc1+0xae, calling clr!WKS::gc_heap::mark_phase 000000001de4cef0 000007fef0874786 clr!WKS::gc_heap::update_collection_counts+0x16, calling 000000000065006e 000000001de4cf20 000007fef0a1fa56 clr!WKS::gc_heap::garbage_collect+0x42e, calling clr!WKS::gc_heap::gc1 000000001de4cf60 000007feede2d774 (MethodDesc 000007feedaa93b8 +0x124 System.TimeZoneInfo.GetDateTimeNowUtcOffsetFromUtc(System.DateTime, Boolean ByRef)), calling (MethodDesc 000007feedaa8708 +0 System.TimeSpan.Add(System.TimeSpan)) 000000001de4cfa0 000007fef07fd4ff clr!SystemNative::__GetSystemTimeAsFileTime+0xf, calling kernel32!GetSystemTimeAsFileTimeStub 000000001de4cff0 000007fef087452e clr!WKS::GCHeap::GarbageCollectGeneration+0x14e, calling clr!WKS::gc_heap::garbage_collect 000000001de4d040 000007fef08734ce clr!WKS::gc_heap::try_allocate_more_space+0x25f, calling clr!WKS::GCHeap::GarbageCollectGeneration 000000001de4d080 000007fef0872f43 clr!WKS::gc_heap::allocate_small+0x158, calling clr!WKS::gc_heap::a_fit_segment_end_p 000000001de4d110 000007fef08731fe clr!FastAllocateObject+0x73e, calling clr!WKS::gc_heap::try_allocate_more_space 000000001de4d1f0 000007fef07fc8b8 clr!JIT_NewFast+0xb8, calling clr!FastAllocateObject 000000001de4d2c8 000007feede3fa80 (MethodDesc 000007feedaaa8e8 +0x40 System.Text.StringBuilder.ExpandByABlock(Int32)), calling clr!JIT_TrialAllocSFastMP_InlineGetThread 0:016> !Threads ThreadCount: 17 UnstartedThread: 0 BackgroundThread: 13 PendingThread: 0 DeadThread: 1 Hosted Runtime: no PreEmptive Lock ID OSID ThreadOBJ State GC GC Alloc Context Domain Count APT Exception 0 1 58e4 0000000000498ba0 2006020 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 STA 2 2 4190 000000000049ee80 b220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Finalizer) 6 3 48d4 000000001ac8bb60 1000220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 Ukn (Threadpool Worker) 8 5 5fbc 000000001aca1970 a009220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Threadpool Completion Port) 9 6 615c 000000001c4b2880 b020 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA 10 7 5818 000000001c4e7bd0 200b220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA 11 8 6e14 000000001c4f0850 7020 Enabled 0000000000000000:0000000000000000 0000000000481df0 2 STA 12 a 683c 000000001c512610 7220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 STA 14 b 6f40 000000001c521120 7220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 STA 15 c 5070 000000001c564760 100a220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Threadpool Worker) 16 d 5fd0 000000000049bc10 b220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA 17 e 10cc 000000001c62e370 b220 Enabled 0000000000000000:0000000000000000 0000000000481df0 2 MTA (GC) System.ExecutionEngineException (0000000002441228) XXXX f 000000001e102c80 15820 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 Ukn 22 10 158c 000000001e103aa0 1009220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Threadpool Worker) 23 12 47e8 000000001e1048c0 8019220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 Ukn (Threadpool Completion Port) 24 4 58a8 000000001e103390 8019220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 Ukn (Threadpool Completion Port) 25 9 2874 000000001e102570 8009220 Enabled 0000000000000000:0000000000000000 0000000000481df0 0 MTA (Threadpool Completion Port) 

This is all good and interesting, but I'm not sure how to move on. Since the error occurs on an automated test machine, where the test control process dies about 1-2 times a day, I can’t just attach a debugger to the process and set some breakpoints to protect records in certain memory locations. Any additional tips on how to evaluate this are welcome. I get more dumps to be able to do at least a differential analysis to check which tests can cause this.

For me, it looks like the CLR array that stores the flow statics is loose, and someone is writing the unooled bool value for the first element of the array. The CLR array does not contain a value, but the address in which the managed object should usually be located, but there is only the bool value (one), and not an ordinary CLR object with its object header.

Can incorrect PInvoke signatures cause this behavior? I have seen things like

  [DllImport( "kernel32.dll" )] public static extern bool Beep( int frequeny_in, int time_in ); 

which returns one byte of bool, but the Beep method returns 4 bytes of bool. Can the wrong return type from PInvoke (bool instead of int) cause such problems?

+4
source share

Source: https://habr.com/ru/post/1496965/


All Articles