Kcachegrind / callgrind does not match dispatcher functions?

I have a model code on which kcachegrind / callgrind reports strange results. This is a kind of dispatch function. The dispatcher is called from 4 places; each call says which actual do_J function to run (so first2 will only call do_1 and do_2 , etc.)

Source (this is the actual code model)

 #define N 1000000 int a[N]; int do_1(int *a) { int i; for(i=0;i<N/4;i++) a[i]+=1; } int do_2(int *a) { int i; for(i=0;i<N/2;i++) a[i]+=2; } int do_3(int *a) { int i; for(i=0;i<N*3/4;i++) a[i]+=3; } int do_4(int *a) { int i; for(i=0;i<N;i++) a[i]+=4; } int dispatcher(int *a, int j) { if(j==1) do_1(a); else if(j==2) do_2(a); else if(j==3) do_3(a); else do_4(a); } int first2(int *a) { dispatcher(a,1); dispatcher(a,2); } int last2(int *a) { dispatcher(a,4); dispatcher(a,3); } int inner2(int *a) { dispatcher(a,2); dispatcher(a,3); } int outer2(int *a) { dispatcher(a,1); dispatcher(a,4); } int main(){ first2(a); last2(a); inner2(a); outer2(a); } 

Compiled with gcc -O0 ; Callgrinded with valgrind --tool=callgrind ; kcachegrinded with kcachegrind and qcachegrind-0.7 .

Here is the complete application code. All the paths to do_J go through the dispatcher, and that’s good (do_1 just hides too fast, but it is really here, it’s just left to do do_2)

Full

Focus on do_1 and see who called it (this image is incorrect):

enter image description here

And this is very strange, I think, only first2 and outer2 called do_1 , but not all.

Is this a limitation of callgrind / kcachegrind? How can I get an accurate callgraph with weights (proportional to the runtime of each function, with and without its children)?

+3
source share
1 answer

Yes, this is a limitation of the callgrind format. He does not keep a complete trace; it only saves information about parent and child calls.

There is a google-perftools project with a CPU profiler pprof / libprofiler.so, http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html . libprofiler.so can get a profile with callrraces, and it will store every trace event with full backtrace. pprof is a libprofile output converter in graphic formats or in callgrind format. In full view, the result will be the same as in kcachegrind; but if you focus on some function, for example. do_1 using focus options pprof; when focusing on the function, the exact calltree will be displayed.

+1
source

Source: https://habr.com/ru/post/905662/


All Articles