What does โ€œSamplesโ€ mean in performance?

I used linux perf to profile my program and I cannot figure out the result.

  10.5% 2 fun ..........
       |
       | - 80% - ABC
       |  call_ABC
       - 20% - DEF
                call_DEF

The above example means that "fun" has two patterns and contributes 10.5% of the overhead,

and 80% of them are called from ABC, 20% from DEF. I'm right?

Now we have only two samples, while "perf" calculates the proportion of ABC and DEF?

Why aren't they 50%? does the 'perf' dose use additional information?

+4
source share
1 answer

The above example means that "fun" has two patterns and contributes 10.5% of the overhead,

Yes, this part of perf report -g -n shows that 2 out of 19 samples (2 - 10.5% of 19) were in the foo function itself. Another 17 samples were selected in a different function.

I just reproduced your code with the latest gcc ( -static -O3 -fno-inline -fno-omit-frame-pointer -g ) and perf ( perf record -e cycles:u -c 500000 -g ./test12968422 for samples with low resolution or -c 5000 for high resolution). Now the punch has different weight rules, but the idea should be the same. If there are only 2 samples for the program, and both are in foo , call-graph ( perf report -n -g callee ) is 50 for each of call_DEF / _ABC (without additional information). This program actually had 86% of the execution time in foo, 61% of them when called from ABC, 25% of 86 when called from DEF:

 100% 2 fun - fun + 50% call_DEF + 50% call_ABC 

What additional information can be used to recover additional information? I think it could be the weight of call_DEF and call_ABC; or it can be the frequency of the "call_ABC-> foo" and "call_DEF-> foo" parts of the callchain in all test call stacks.

With the first versions of the Linux kernel version 4.4 / 4.10, I cannot reproduce your situation. I added a different amount of independent work to call_ABC and call_DEF. Both of them just calls foo for a fixed amount of work. Now I have 19 samples -e cycles:u -c 54000 -g , 13 for call_ABC, 2 for call_DEF, 2 for fun (and 2 in some random functions):

  Children Self Samples Symbol 74% 68% 13 [.] call_ABC 16% 10.5% 2 [.] call_DEF 10.5% 10.5% 2 [.] fun - fun + 5.26% call_ABC + 5.26% call_DEF 

So, try the new version of perf, not from the era of 3.2 Linux kernels.

The first source of fun only works, the inequality is divided when called from ABC and from DEF:

 #define g 100000 int a[2+g]; void fill_a(){ a[0]=0; for(int f=0;f<g;f++) a[f+1]=f; } int fun(int b) { while(a[b]) b=a[b]; return b; } int call_ABC(int b) { int d = b; b = fun(d); return db; } int call_DEF(int b) { int e = b; b = fun(e); return e+b; } int main() { int c,d; fill_a(); c=call_ABC(100000); c=call_DEF(45000); return c+d; } 

The second source of uneven work in ABC and DEF with the same little work in the fun:

 #define g 100000 int a[2+g]; void fill_a(){ a[0]=0; for(int f=0;f<g;f++) a[f+1]=f; } int fun(int b) { while(a[b]) b=a[b]; return b; } int call_ABC(int b) { int d = b; while(a[d]) d=a[d]; b = fun(5000); return db; } int call_DEF(int b) { int e = b; while(a[e]) e=a[e]; b = fun(5000); return e+b; } int main() { int c,d; fill_a(); c=call_ABC(100000); c=call_DEF(20000); return c+d; } 
0
source

Source: https://habr.com/ru/post/1440704/


All Articles