C ++ virtual table search - how to search and replace

Let's look at an example below:

class Base{ virtual string function1(){ return "Base - function1"; }; virtual string function2(){ return "Base - function2"; }; }; class Derived : public Base { virtual string function2(){ return "Derived - function2"; }; virtual string function1(){ return "Derived - function1"; }; string function3() { return "Derived - function3"; }; }; 

So the vtable structure is like

 Base-vTable ----------------------- name_of_function address_of_function function1 &function1 function2 &function2 ----------------------- ----------------------- Derived-vTable ----------------------- name_of_function address_of_function function1 &function1 function2 &function2 

or is it like

  Base-vTable ----------------------- Offset function +0 function1 +4 function2 ----------------------- ----------------------- Derived-vTable ----------------------- Offset function +0 function1 +4 function2 

If this looks like the last? what is this bias? where is it used

And function name: Is this the changed function name? if it is garbled, the base and derived garbled names will not match, and vtable search will not work. The compiler cripples all the names of virtual functions, so this should be a distorted name, does this mean that the distorted name for the base and derivatives is the same if it is a virtual function.

+6
source share
6 answers

Virtual tables are just arrays of function pointers, just like your second snippet. The compiler translates calls to virtual functions into calls using a pointer, for example

 Base * b = /* ... */; b->function2(); 

translates to

 b->__vtable[1](); 

where I used the name __vtable to refer to the virtual table (note that the virtual table is usually not directly accessible).

The order of the entries in the table is determined by the order in which the functions are declared in the class. Remember that a class definition is always available at the dial peer.

+7
source

I explain the following code. I want you to understand.

  Base *p = new Derived; p->function2(); 

At compile time, a VTable is created, the VTable of the Base class is identical to the VTable of the Derived class. I mean that both have 2 functions, as you mentioned in the first case. The compiler inserts the code to initialize the vptr of the desired object.

When the compiler sees the instruction p-> function2 ();, it does not bind to the called function, since t knows only about the base object. From the VTable Base class, he learns the position of function2 (here is the 2nd position in VTable).

At run time, the VTable of the Dervied class is assigned vptr. The function is called in the 2nd position VTable.

+3
source

The easiest way to fix this is to look for the actual implementation.

Consider the following code:

 struct Base { virtual void foo() = 0; }; struct Derived { virtual void foo() { } }; Base& base(); void bar() { Base& b = base(); b.foo(); // virtual call } 

And now submit this to the Clan's Try Out page to get the LLVM IR:

 ; ModuleID = '/tmp/webcompile/_6336_0.bc' target datalayout = "ep:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" %struct.Base = type { i32 (...)** } define void @_Z3barv() { %1 = tail call %struct.Base* @_Z4basev() %2 = bitcast %struct.Base* %1 to void (%struct.Base*)*** %3 = load void (%struct.Base*)*** %2, align 8 %4 = load void (%struct.Base*)** %3, align 8 tail call void %4(%struct.Base* %1) ret void } declare %struct.Base* @_Z4basev() 

Since I assume that you do not yet know about IR, let it be reviewed in parts.

Come up with something to worry about first. It identifies the architecture (processor and system) for which it is compiled along with its properties.

 ; ModuleID = '/tmp/webcompile/_6336_0.bc' target datalayout = "ep:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" 

Then LLVMs study the types:

 %struct.Base = type { i32 (...)** } 

He analyzes types structurally. Thus, we only get that Base will consist of a single i32 (...)** element: this is actually a โ€œnotoriousโ€ v-table pointer. Why is this weird type? Because we will store many pointers to functions of different types in the v-table. This means that we will have a heterogeneous array (which is impossible), so instead we consider it as an array of "common" unknown elements (to mark that we provide what is), and this up to the application to cast a pointer to the corresponding the type of the function pointer before its actual use (or rather, it would be if we were in C or C ++, IR is much lower).

Jumping to the end:

 declare %struct.Base* @_Z4basev() 

this declares a function ( _Z4basev , the name is _Z4basev ) that returns a pointer to Base : in IR links and pointers, both are represented by pointers.

So, let's look at the definition of bar (or _Z3barv , as it is distorted). Here were some interesting things:

  %1 = tail call %struct.Base* @_Z4basev() 

A Base call that returns a pointer to Base (the return type is always optimized on the call site, much easier to parse), this is stored in a constant named %1 .

  %2 = bitcast %struct.Base* %1 to void (%struct.Base*)*** 

A strange bit-bit that converts our Base* to a pointer that it changes things ... In fact, we get a v-table pointer. It was not โ€œnamed,โ€ and we simply ensured in the type definition that it was the first element.

  %3 = load void (%struct.Base*)*** %2, align 8 %4 = load void (%struct.Base*)** %3, align 8 

First, load the v-table (name it %2 ), and then load the function pointer (pointing to %3 ). Therefore, at this moment, %4 &Derived::foo .

  tail call void %4(%struct.Base* %1) 

Finally, we call the function, and we pass it the this element specified here.

+3
source

The second case - pointers for input take 4 bytes (32-bit machines).

Function names are never saved in an executable file (other than debugging). A virtual table is just a vector of function pointers directly accessible by executable code.

+2
source

In fact, before the compiler, the standard does not indicate how the memory representation works. The standard states that polymorphism should always work (even with inline functions, just like yours). Your functions may be built-in depending on the context and smartness of the compiler, so sometimes even call or jmp may occur. However, in most compilers, the second option is most likely.

In your case:

 class Base{ virtual string function1(){ return "Base - function1"; }; virtual string function2(){ return "Base - function2"; }; }; class Derived : public Base { virtual string function2(){ return "Derived - function2"; }; virtual string function1(){ return "Derived - function1"; }; }; 

Suppose you have:

 Base* base = new Base; Base* derived = new Derived; base->function1(); derived->function2(); 

For the first call, the compiler will get the vftable address for Base and call the first function in that vftable . For the second call, vftable is in a different place, since the object is of type Derived . It searches for the second function, moving to the offset from the beginning of vftable, in which the functions are encountered (which means vftable + offset - most likely 4 bytes, but again, it depends on the platform).

+1
source

When a virtual function is added to the class, the compiler creates a hidden pointer (called v-ptr) as a member of the class. [You can check it by taking sizeof (class), which increases by sizeof (pointer)] Also, the compiler internally adds some code at the beginning of the constructor to initialize v-ptr to the base offset of the v-table of the class. Now that this class is obtained by some other class, this v-ptr is also output by the Derived class. And for the Derived class, this v-ptr is initialized with the base offset of the Derived v-class class. And we already know that v-tables of the corresponding classes will store the addresses of their versions of virtual functions. [Note that if the virtual function is not redefined in the derived class, then the address of the base version or the most derived version (for multi-level inheritance) of the function in the hierarchy will be stored in the v-table]. Therefore, at runtime, it simply calls the function through this v-ptr. Therefore, if the base class pointer stores the base object, then the base version of v-ptr is launched. Since it points to the base version of the v-table, this will automatically call the base version of the function. The same applies to the Derived object.

+1
source

Source: https://habr.com/ru/post/900759/


All Articles