Debug stack corruption issues

I am debugging an “Access Violation” exception in a large C ++ application (Visual Studio 2015). The application is built from several libraries, and the problem occurs in one of them (SystemC), although I suspect that the source of the problem is in a different place.

What I see is a function call that distorts the address of the caller's member function.

m_update_phase = true; m_prim_channel_registry->perform_update(); m_update_phase = false; 

 inline void sc_prim_channel_registry::perform_update() { for( int i = m_update_last; i >= 0; -- i ) { m_update_array[i]->perform_update(); } m_update_last = -1; } 

(These are excerpts from systemc\kernel\sc_simcontext.cpp and systemc\communication\sc_prim_channel.h , part of the SystemC library)

The error occurs after several iterations through this code above. Calling m_prim_channel_registry->perform_update() throws an exception 0xC0000005: Access violation writing location 0x0F4CD9E9. .
This only happens when you create the application in the Release configuration.

Looking at the assembly code, I see that the function sc_prim_channel_registry::perform_update() was built in, and the call to the internal function m_update_array[i]->perform_update() seems to distort the stack frame of the calling function .
When m_update_last = -1; is m_update_last = -1; , & m_update_last no longer indicates a valid memory location and exception. ( m_update_last is a simple native member of the sc_prim_channel_registry class with type int )

  m_update_phase = true; m_prim_channel_registry->perform_update(); 1034D99E mov eax,dword ptr [esi+10h] 1034D9A1 mov byte ptr [esi+0A3h],1 1034D9A8 mov dword ptr [ebp-18h],eax 1034D9AB mov ebx,dword ptr [eax+28h] 1034D9AE test ebx,ebx 1034D9B0 js $LN163+0FEh (1034D9D0h) 1034D9B2 mov esi,eax 1034D9B4 mov eax,dword ptr [esi+20h] 1034D9B7 mov edi,dword ptr [eax+ebx*4] 1034D9BA mov ecx,edi 1034D9BC mov eax,dword ptr [edi] 1034D9BE call dword ptr [eax+14h] 1034D9C1 sub ebx,1 1034D9C4 mov byte ptr [edi+1Ch],0 1034D9C8 jns $LN163+0E2h (1034D9B4h) 1034D9CA mov esi,dword ptr [this] 1034D9CD mov eax,dword ptr [ebp-18h] 1034D9D0 mov dword ptr [eax+28h],0FFFFFFFFh m_update_phase = false; 

An exception was 1034D9D0 at address 1034D9D0 Thus, the last executed commands:

 0F97D9CD mov eax,dword ptr [ebp-18h] 0F97D9D0 mov dword ptr [eax+28h],0FFFFFFFFh 

m_prim_channel_registry address is in [ebp-18h] and eax, and [eax + 28h] m_update_last .

Looking in the viewport on esp and ebp before the inner call to perform_update() , I see that:

  ebp-18h 0x0022fd5c unsigned int esp 0x0022fd60 unsigned int 

This is strange. The difference between them is only 4 bytes, and the next click on the stack will make them equal and overwrite [ebp-18h]!
[ebp-18h] contains a copy of this->m_prim_channel_registry . Calling 1034D9BE call dword ptr [eax+14h] when it pushes the stack, distorts the contents of ebp-18h. It looks like the stack has grown too much (down) and corrupts the previous frame.

My questions:

  • Am I analyzing the problem correctly? Am I missing something here?
  • What could lead to such corruption? I assume that the problem is neither with the compiler, nor with the SystemC library, possibly with what happened earlier somewhere else.
  • What are the methods for debugging such corruption?

Update

I think I found the problem, but I can’t say that I fully understand it. In the same function ( sc_simcontext::crunch ), where the external perform_update() is called, systemc methods are called:

  // execute method processes sc_method_handle method_h = pop_runnable_method(); while( method_h != 0 ) { try { method_h->execute(); } catch( const sc_exception& ex ) { cout << "\n" << ex.what() << endl; m_error = true; return; } method_h = pop_runnable_method(); } 

These methods are deferred functions registered earlier.
One of these methods returned by executing ret 4 , thereby reducing the stack frame each time it was called to the point where the violation described above occurred.

And how did I manage to register a damaged systemc method?
Apparently this is a bad idea using SC_METHOD(f) when f is a virtual module function. . This caused another, unrelated "random" function to be called. I'm not quite sure why this happens and why this restriction exists. Also, I don’t remember the warning about how to use virtual member functions as systemc methods, however this is definitely a problem. When debugging the registration of a method in an SC_METHOD call, I could see that the function pointer inside indicates a different function than was assigned to the SC_METHOD macro.

To fix the problem, I called SC_METHOD(wrapper_f) , where wrapper_f is a simple non-virtual module function, that all it does is call f , the original virtual function. What is it.

+5
source share
2 answers

You probably have problems with member function pointers on MSVC.

Consider the following code, the main.cpp file:

 #include <cstdio> struct base; typedef void (base::*baseptr_t)(); struct base { void foo() { } }; void callfoo(base *obj, baseptr_t ptr); int main() { base obj; std::printf("sizeof(baseptr_t)=%llu\n", sizeof(baseptr_t)); callfoo(&obj, &base::foo); } 

and file callfoo.cpp:

 #include <cstdio> struct base; typedef void (base::*baseptr_t)(); void callfoo(base *obj, baseptr_t ptr) { std::printf("sizeof(baseptr_t)=%llu\n", sizeof(baseptr_t)); (obj->*ptr)(); } 

On x86_64, this prints:

 sizeof(baseptr_t)=8 sizeof(baseptr_t)=24 

before an access violation failure.

This is because MSVC generates 8-byte pointers for classes with a well-known definition, but there should not be 24-byte pointers to generate a class definition.

The compiler has ways to control this behavior:

PS: I could not reproduce this, but you can also check the sc_process.h header from SystemC, it has the following lines:

 #if defined(_MSC_VER) #if ( _MSC_VER > 1200 ) # define SC_USE_MEMBER_FUNC_PTR #endif #else # define SC_USE_MEMBER_FUNC_PTR #endif 

You can try undefined this macro for your assembly, in which case SystemC will try to use a different strategy when calling the process function.

PS2: the size of a member function pointer can be 8, 16, and 24 bytes in size depending on its hierarchy, so there should be 3 ways to dereference a member function, plus each method should handle virtual and non-virtual calls.

+2
source

You seem to know what you are doing.

I can give you advice, not a solution, but this is what I have met more than a few times, which distorts the stack.

Check the function causing the corruption, perform_update() . Does it define a large array as a local variable? If so, it probably exceeds the stack and overrides the returned data and other important data. This is the most common problem that I encountered while stack corruption.

This is not an easy task, because it depends on the size of the local array and the number of stacks. This varies from system to system.

0
source

Source: https://habr.com/ru/post/1263193/


All Articles