Given that the dynamic type g is known exactly as gadget , the compiler can devirtualize the bar call after inserting foo , regardless of whether final used in the class gadget declaration or the gadget::bar declaration. I will analyze this similar program that does not use iostreams, since data assembly is easier to read:
class widget { public: void foo() { bar(); } private: virtual void bar() = 0; }; class gadget : public widget { void bar() override { ++counter; } public: int counter = 0; }; int test1() { gadget g; g.foo(); return g.counter; } int test2() { gadget g; g.foo(); g.foo(); return g.counter; } int test3() { gadget g; g.foo(); g.foo(); g.foo(); return g.counter; } int test4() { gadget g; g.foo(); g.foo(); g.foo(); g.foo(); return g.counter; } int testloop(int n) { gadget g; while(--n >= 0) g.foo(); return g.counter; }
We can determine the success of devirtualization by looking at the output assembly: (GCC) , (clank) . Both optimize test to the equivalent of return 1; - the call is virtualized and embedded, and the object is eliminated. Clang does the same for test2 via test4 - return 2; / 3/4, respectively - but GCC seems to be gradually losing type information, the more it should perform optimization . Despite the successful optimization of test1 to return a constant, test2 becomes approximately:
int test2() { gadget g; g.counter = 1; g.gadget::bar(); return g.counter; }
The first call was devirtualized and its effect was nested ( g.counter = 1 ), but the second was only devirtualized. Adding an additional call to test3 results in:
int test3() { gadget g; g.counter = 1; g.gadget::bar(); g.bar(); return g.counter; }
Again, the first call is fully nested, the second is only virtualized, but the third call is not optimized at all. This is a simple loading of Jane from a virtual table and calling an indirect function. The result will be the same for an additional call in test4 :
int test4() { gadget g; g.counter = 1; g.gadget::bar(); g.bar(); g.bar(); return g.counter; }
It is noteworthy that no compiler does not virtualize the call in a simple testloop loop, which they both compile into the equivalent:
int testloop(int n) { gadget g; while(--n >= 0) g.bar(); return g.counter; }
even reloads the vtable pointer from the object at each iteration.
Adding a final token to both the class gadget declaration and the gadget::bar definition does not affect the assembly output generated by the GCC compiler (clang) .
What affects the generated assembly is the removal of the NVI. This program:
class widget { public: virtual void bar() = 0; }; class gadget : public widget { public: void bar() override { ++counter; } int counter = 0; }; int test1() { gadget g; g.bar(); return g.counter; } int test2() { gadget g; g.bar(); g.bar(); return g.counter; } int test3() { gadget g; g.bar(); g.bar(); g.bar(); return g.counter; } int test4() { gadget g; g.bar(); g.bar(); g.bar(); g.bar(); return g.counter; } int testloop(int n) { gadget g; while(--n >= 0) g.bar(); return g.counter; }
fully optimized by both compilers ( GCC ) ( clang ) to the equivalent:
int test1() { return 1; } int test2() { return 2; } int test3() { return 3; } int test4() { return 4; } int testloop(int n) { return n >= 0 ? n : 0; }
In conclusion, although compilers can devirtualize bar calls, they may not always do so in the presence of NVI. The use of optimization in modern compilers is imperfect.