I am debugging an “Access Violation” exception in a large C ++ application (Visual Studio 2015). The application is built from several libraries, and the problem occurs in one of them (SystemC), although I suspect that the source of the problem is in a different place.
What I see is a function call that distorts the address of the caller's member function.
m_update_phase = true; m_prim_channel_registry->perform_update(); m_update_phase = false;
inline void sc_prim_channel_registry::perform_update() { for( int i = m_update_last; i >= 0; -- i ) { m_update_array[i]->perform_update(); } m_update_last = -1; }
(These are excerpts from systemc\kernel\sc_simcontext.cpp and systemc\communication\sc_prim_channel.h , part of the SystemC library)
The error occurs after several iterations through this code above. Calling m_prim_channel_registry->perform_update() throws an exception 0xC0000005: Access violation writing location 0x0F4CD9E9. .
This only happens when you create the application in the Release configuration.
Looking at the assembly code, I see that the function sc_prim_channel_registry::perform_update() was built in, and the call to the internal function m_update_array[i]->perform_update() seems to distort the stack frame of the calling function .
When m_update_last = -1; is m_update_last = -1; , & m_update_last no longer indicates a valid memory location and exception. ( m_update_last is a simple native member of the sc_prim_channel_registry class with type int )
m_update_phase = true; m_prim_channel_registry->perform_update(); 1034D99E mov eax,dword ptr [esi+10h] 1034D9A1 mov byte ptr [esi+0A3h],1 1034D9A8 mov dword ptr [ebp-18h],eax 1034D9AB mov ebx,dword ptr [eax+28h] 1034D9AE test ebx,ebx 1034D9B0 js $LN163+0FEh (1034D9D0h) 1034D9B2 mov esi,eax 1034D9B4 mov eax,dword ptr [esi+20h] 1034D9B7 mov edi,dword ptr [eax+ebx*4] 1034D9BA mov ecx,edi 1034D9BC mov eax,dword ptr [edi] 1034D9BE call dword ptr [eax+14h] 1034D9C1 sub ebx,1 1034D9C4 mov byte ptr [edi+1Ch],0 1034D9C8 jns $LN163+0E2h (1034D9B4h) 1034D9CA mov esi,dword ptr [this] 1034D9CD mov eax,dword ptr [ebp-18h] 1034D9D0 mov dword ptr [eax+28h],0FFFFFFFFh m_update_phase = false;
An exception was 1034D9D0 at address 1034D9D0 Thus, the last executed commands:
0F97D9CD mov eax,dword ptr [ebp-18h] 0F97D9D0 mov dword ptr [eax+28h],0FFFFFFFFh
m_prim_channel_registry address is in [ebp-18h] and eax, and [eax + 28h] m_update_last .
Looking in the viewport on esp and ebp before the inner call to perform_update() , I see that:
ebp-18h 0x0022fd5c unsigned int esp 0x0022fd60 unsigned int
This is strange. The difference between them is only 4 bytes, and the next click on the stack will make them equal and overwrite [ebp-18h]!
[ebp-18h] contains a copy of this->m_prim_channel_registry . Calling 1034D9BE call dword ptr [eax+14h] when it pushes the stack, distorts the contents of ebp-18h. It looks like the stack has grown too much (down) and corrupts the previous frame.
My questions:
- Am I analyzing the problem correctly? Am I missing something here?
- What could lead to such corruption? I assume that the problem is neither with the compiler, nor with the SystemC library, possibly with what happened earlier somewhere else.
- What are the methods for debugging such corruption?
Update
I think I found the problem, but I can’t say that I fully understand it. In the same function ( sc_simcontext::crunch ), where the external perform_update() is called, systemc methods are called:
// execute method processes sc_method_handle method_h = pop_runnable_method(); while( method_h != 0 ) { try { method_h->execute(); } catch( const sc_exception& ex ) { cout << "\n" << ex.what() << endl; m_error = true; return; } method_h = pop_runnable_method(); }
These methods are deferred functions registered earlier.
One of these methods returned by executing ret 4 , thereby reducing the stack frame each time it was called to the point where the violation described above occurred.
And how did I manage to register a damaged systemc method?
Apparently this is a bad idea using SC_METHOD(f) when f is a virtual module function. . This caused another, unrelated "random" function to be called. I'm not quite sure why this happens and why this restriction exists. Also, I don’t remember the warning about how to use virtual member functions as systemc methods, however this is definitely a problem. When debugging the registration of a method in an SC_METHOD call, I could see that the function pointer inside indicates a different function than was assigned to the SC_METHOD macro.
To fix the problem, I called SC_METHOD(wrapper_f) , where wrapper_f is a simple non-virtual module function, that all it does is call f , the original virtual function. What is it.