Shared vtables between classes of the same name: a virtual method call is reset when casting to the base type

Question

Shared vtables between classes of the same name: a virtual method call is reset when casting to the base type

Check below for UPDATES, I might reproduce and need help.

I have a strange failure when some method works fine everywhere except in one place. Here is the code:

struct base { virtual wchar_t* get() = 0; // can be { return NULL; } doesn't matter }; struct derived: public base { virtual wchar_t* get() { return SomeData(); } }; struct container { derived data; }; // this is approx. how it is used in real program void output(const base& data) { data.get(); } smart_ptr<container> item = GetItSomehow(); derived &v1 = item->data; v1.get(); // works OK //base &v2 = (base&)derived; // the old line, to understand old comments in the question base &v2 = v1; // or base* v2 doesn't matter v2.get(); // segmentation fault without going into method at all

Now, as I said, I call item-> data.get () in many places on different objects, and it works ... always. Except for 1 place. But there it does not work only if it is added to the base class (the output is an example of why this is necessary).

Now the question is: HOW and WHY can this happen? I would suspect a pure virtual call, but I am not calling the virtual method in the constructor. I do not see how different the challenges are. I would suggest that the base method is abstract, but it is the same if I add a body to it.

I can’t imagine a small example for testing, because, as I said, it always works, except for 1 place. If I knew why this does not work, I don’t need a test sample, because it will already be the answer ...

PS The Ubuntu 11.10 x64 environment, but the program was compiled for 32 bits using the special gcc 4.5.2 build.

PPS One more clue, not sure if it is connected ...

 warning: can't find linker symbol for virtual table for `derived::get' value warning: found `SomeOtherDerivedFromBaseClass::SomeOtherCrazyFunction' instead

in a real program

UPDATE: the likelihood that this could happen due to the vcc gcc link to the wrong class with the same name, but inside a different shared library? A “derived” class in a real application is actually defined in several shared libraries, and, even worse, there is another similar class with the same name but with a different interface. What is strange is that without casting to the base class it works.

I am particularly interested in the details about gcc / linking / vtables here.

Here is how I seem to reproduce:

 // --------- mod1.h class base { public: virtual void test(int i); // add method to make vtables different with mod2 virtual const char* data(); }; class test: public base { public: virtual const char* data(); }; // --------- mod2.h class base { public: virtual const char* data(); }; class test: public base { public: virtual const char* data(); }; // --------- mod2.cpp #include "mod2.h" const char* base::data() { return "base2"; } const char* test::data() { return "test2"; } // --------- modtest.cpp #include <stdio.h> // !!!!!!!!! notice that we include mod1 #include "mod1.h" int main() { test t; base& b = t; printf("%s\n", t.data()); printf("%s\n", b.data()); return 0; } // --------- how to compile and run g++ -c mod2.cpp && g++ mod2.o modtest.cpp && ./a.out // --------- output from the program queen3@pro-home :~$ ./a.out test2 Segmentation fault

In modtest above, if we include "mod2.h" instead of "mod1.h", we get the normal output "test2 \ ntest2" without segfault.

Question: What is the mechanism for this? How to detect and prevent? I knew that static data in gcc would be associated with one record, but vtables ...

+4

c ++

queen3 Jan 13 '12 at 14:18

source share

7 answers

Your answer in your question: "Shared vtables between classes of the same name ...".

You have compiled one binary file from two cpp files, but each cpp file includes a different header file and, in particular, another struct base definition. In C ++, you cannot have two classes with the same name. If the same name is used, then they are the same class, and you must be consistent. (The obvious exception is putting them in two different namespaces.)

(Everything here is specific to the compiler. But this is probably the typical approach for most compilers.)

First, let's understand non-virtual methods. When you execute this method for an object:

 b.foo(3);

the code is basically rewritten as follows, as if it were a regular free function:

 foo_(b,3);

using the method implemented as follows:

 void foo_(base * this, int i) { ... }

i.e. this pointer is "secretly" passed as the first parameter to the function.

But everything is not so simple with virtual methods. There will be two different free features that implement get . We will call one of them get_base , and the other get_derived . (It doesn't matter that you really have a pure virtual method ( =0 ), it really does not change the history.)

The question is, how is the correct get selected at run time for execution? Well, for every class that has at least one virtual method, the compiler creates a vtable. The vtable for this class lists all the virtual methods in this class. for instance

 struct vtable_for_base_t { wchar_t* (*get_function_pointer)(base *); // initialized to get_base }; vtable_for_base_t vtable_for_base; vtable_for_base.get_function_pointer = &get_base; vtable_for_????_t vtable_for_derived; vtable_for_derived.get_function_pointer = &get_derived;

A function pointer type is a function that takes one parameter (a base* , which becomes this ) and returns wchar_t* .

The two base and derived classes actually have pointers to these vtables under the hood.

 struct base { vtable_for_base_t * vtable; .... other members of base }; struct derived { vtable_for_????_t * vtable; .... other members of derived };

Whenever a base object is created, the vtable pointer is initialized to point to the vtable for the base . Whenever a derived object is created, it instead points to a vtable for derived . Now that the compiler sees b.get() , it will change this to the next

 (b.vtable->get_function_pointer)(&b);

It looks for the vtable pointed to by the b object to get a pointer to the correct version of get to use. And then it passes b this function to make sure it has the correct this pointer.

Thus, each object has a (hidden) element that knows the correct version of virtual functions. In this case, the compiler assumes that the first entry in the vtable for the base, as well as vtable for any type received from the base, will be the get method.

When building vtables for derived classes, the first entries will correspond to the methods that were in the base class. And they will be in the same order as in the database. Any new virtual methods in the derived class will be listed later.

If you had two virtual methods: foo and bar , in base , then these will be the first two entries in the vtable for base , and the corresponding versions for derivatives will also occupy the first two slots in the vtable for derived .

Now, to understand why you get segfault. In mod2.h , a vtable is created for the database, where data is the first (and only) record. Therefore, any code that includes mod2.h and that b.data() trying to execute b.data() execute the first entry in the vtable. But it doesn’t matter when modtest.cpp compiled, because instead it includes mod1.h

modtest.cpp includes mod1.h As a result, he sees the base class, which has two methods, where data is the second method specified in the vtable. Therefore, any attempt to execute b.data will actually become:

 (b.vtable.SECOND_ENTRY)(&b);

since it assumes that the second element will contain the data() entry.

It will try to get the second record from the vtable, but the real vtable (created in mod2.h ) has only one record! Therefore, it tries to access invalid memory, and everything fails.

In short, consider the definition of these two structures in two different header files in C:

  // in one file struct A { int i; char c[500]; } // in another file struct A { char c[500]; int i; }

No one expected this to work. The program will often access the wrong memory. Therefore, you should not mess with vtables.

+4

Aaron mcdaid Jan 29 '12 at 23:18

source share

There is no need to explicitly specify when processing the derived class as the parent class:

 #include <iostream> struct A { virtual void get() { std::cout << "A" << std::endl; } }; struct B : public A { virtual void get() { std::cout << "B" << std::endl; } }; int main(int argc, char **argv) { B b; A & a = b; a.get(); return 0; }

The more explicit the cast in this case can hide the errors. You can tell the compiler that you know what you are doing and it won’t stop or in many cases it won’t even warn you that you are doing something that won’t work.

If it does not compile without a cast, this means that there is an error in the code (and in most cases the compiler gives you the reason for the error message).

+3

elmo Jan 13 '12 at 14:30

source share

In your second example, you break one definition rule .

To quote from Wikipedia:

In any translation unit, a template, type, function or object can have no more than one definition. Some of them can have any number of ads. The definition contains an instance.
Throughout a program, an object or non-built-in function cannot have more than one definition; if an object or function is used, it must have exactly one definition. You can declare an object or function that is never used, in which case you do not need to provide a definition. In no case can there be more than one definition.
Some things, such as types, templates, and external built-in functions, can be defined in more than one translation unit. For this object, each definition must be the same. Non-extreme objects and functions in different translation units are different objects, even if their names and types are the same.

You are breaking part 2 of the rule. Both base and test are declared several times and conflict in mod1.hh and mod2.hh , so your program is invalid and causes undefined behavior. Therefore, you sometimes experience failures, and sometimes you do. However, your program is not valid. The compiler should not warn you, because both definitions are displayed in different translation units, and the standard does not require consistency checking from it compared to compilation units in this case.

Preventing this kind of problem is quite simple. Namespaces have been created for this. Try to separate your classes in a specific namespace, and ODR will no longer be a problem.

Finding these kinds of things is a bit trickier. One thing you can try is unity-build . It looks very scary at first sight, but actually helps in solving many problems with this kind of thing. As a side effect, a single assembly will also speed up compilation time during development. The link above provides instructions for using creating a single assembly in Visual Studio, but it’s actually quite simple to add files to the make (including automatically creating the necessary header).

+3

LiKao Jan 27 '12 at 19:09

source share

 base &v2 = (base&)derived; // or base* v2, doesn't matter

must read

 base &v2 = v1;

+1

cli_hlt Jan 13 '12 at 14:35

source share

Your problem here is not a violation of one rule of definition. In fact, one rule of definition is ONE problem, but it can be solved using this method.

Dynamic cast will fix'er'up.

 test t; // Using a pointer to make the cast a little more obvious base *b = dynamic_cast<base *>(&t);

This is straight from the C ++ documentation document at http://www.cplusplus.com/doc/tutorial/typecasting/ . It will return a NULL pointer or throw an exception on failure, depending. In any case, you will catch a runtime error.

~~Although dynamic_casts is technically better practiced, static_cast can also be used.~~ UPDATE: you wanted to know how to catch it at runtime, and static_cast probably won't catch it at compile time, sorry.

Then, to avoid similar problems in the future, use explicit namespaces. There really is no reason to never use them. Even your main program can use it, even if it is long, by smoothing it.

I will rip an example from IBM because these are schmucks:

 namespace INTERNATIONAL_BUSINESS_MACHINES { void f(); } namespace IBM = INTERNATIONAL_BUSINESS_MACHINES;

If your libraries do not use namespaces, then they are bad libraries and should be removed, then the media on which they were included should be immersed in an acid bath, and any tubes you downloaded should receive a triple dose of Draino, Although Of course, we often go in cycles using code that leaves much to be desired ...

+1

std''OrgnlDave Jan 31 '12 at 6:36

source share

The question is what kind of mechanism for this?

See other answers about ODR.

How to detect and prevent?

Create a bulk translation of your libraries and include every dependency in it.

Make sure you use the correct scope and visibility. If this image is private, this is one case (the anon namespace or the reserved image). Otherwise, it should be publicly available and visible to customers through inclusion. Including just one TU and using well-defined conventions to determine the scope and visibility will catch many of the errors.

In some cases, a linker may also be used. In fact, exporting your virtual defs is a great idea for many reasons - the linker would notice this problem.

I knew that static data in gcc would be associated with one record, but vtables ...

may be duplicated if virtual machine definitions are visible. That is, all your rtti-info and vtable information can be exported to TU, which can cause serious bloat and add some time to compile and bind time.

+1

justin Jan 31 '12 at 7:14

source share

Mark b · Accepted Answer · 2012-01-13T14:47:36+0000

Edit in response to the update: In the updated code, where you use the mod1 and mod2 , you break the rule of one definition for classes (even if they appear in shared libraries). This basically means that in the whole program you should have only one class definition ( base in this case), although the same definition can be displayed in several source files. If you have more than one definition, all bets are disabled and you get undefined behavior. In this case, the undefined behavior fails. A fix, of course, should not contain multiple versions of the same class in the same program. This is usually achieved by defining each class in one header (or implementation for classes other than API / impl), and including this header where class definition is required.

Original answer: If it works everywhere except for one place, it sounds as if the object is not valid in one place (working as a derived pointer, but not as a basic sound, similar to how you entered the undefined behavior area). Either it's a memory corruption, a pointer to deleted objects, or something else. Best if you can run valgrind on it.

Shared vtables between classes of the same name: a virtual method call is reset when casting to the base type

More articles: