Consider this code under gcc 4.5.1 (Ubuntu 10.04, intel core2duo 3.0 Ghz) These are just 2 tests, in the first I make a direct call to virtual fucnion, and in the second I call it through the Wrapper class:
test.cpp
#define ITER 100000000 class Print{ public: typedef Print* Ptr; virtual void print(int p1, float p2, float p3, float p4){} }; class PrintWrapper { public: typedef PrintWrapper* Ptr; PrintWrapper(Print::Ptr print, int p1, float p2, float p3, float p4) : m_print(print), _p1(p1),_p2(p2),_p3(p3),_p4(p4){} ~PrintWrapper(){} void execute() { m_print->print(_p1,_p2,_p3,_p4); } private: Print::Ptr m_print; int _p1; float _p2,_p3,_p4; }; Print::Ptr p = new Print(); PrintWrapper::Ptr pw = new PrintWrapper(p, 1, 2.f,3.0f,4.0f); void test1() {
I profiled it with gprof and objdump:
g++ -c -std=c++0x -pg -g -O2 test.cpp objdump -d -M intel -S test.o > objdump.txt g++ -pg test.o -o test ./test gprof test > gprof.output
in gprof.output I noticed that test2 () takes longer than test1 (), but I cannot explain it
Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 49.40 0.41 0.41 1 410.00 540.00 test2() 31.33 0.67 0.26 200000000 0.00 0.00 Print::print(int, float, float, float) 19.28 0.83 0.16 1 160.00 290.00 test1() 0.00 0.83 0.00 1 0.00 0.00 global constructors keyed to p
The build code in objdump.txt doesn't help me either:
How can we explain this difference?