How to use google perf tools

Question

How to use google perf tools

I just started using google performance tools ( google-perftools and libgoogle-perftools4 in ubuntu), I swear that I work during the day and I did not find the answer! The problem is that I am not getting the result for ALL of my CPU profiling functions. This is my code:

 #include "gperftools/profiler.h" #include <iostream> #include <math.h> using namespace std; void bar() { int a,b,c,d,j,k; a=0; int z=0; b = 1000; while(z < b) { while (a < b) { d = sin(a); c = cos(a); j = tan(a); k = tan(a); k = d * c + j *k; a++; } a = 0; z++; } } void foo() { cout << "hey " << endl; } int main() { ProfilerStart("/home/mohammad/gperf/dump.txt"); int a = 1000; while(a--){foo();} bar(); ProfilerFlush(); ProfilerStop(); }

Compiled as g++ test.cc -lprofiler -o a.out

this is how i run the code:

 CPUPROFILE=dump.txt ./a.out

I also tried this:

 CPUPROFILE_FREQUENCY=10000 LD_PRELOAD=/usr/local/lib/libprofiler.so.0.3.0 CPUPROFILE=dump.txt ./a.out

And here is what I get from google-pprof --text a.out dump.txt :

 Using local file ./a.out. Using local file ./dump.txt. Total: 22 samples 8 36.4% 36.4% 8 36.4% 00d8cb04 6 27.3% 63.6% 6 27.3% bar 3 13.6% 77.3% 3 13.6% __cos (inline) 2 9.1% 86.4% 2 9.1% 00d8cab4 1 4.5% 90.9% 1 4.5% 00d8cab6 1 4.5% 95.5% 1 4.5% 00d8cb06 1 4.5% 100.0% 1 4.5% __write_nocancel 0 0.0% 100.0% 3 13.6% __cos

But there is no information about the function foo!

my system info: ubuntu 12.04 g ++ 4.6.3

That's all!

+6

profiling gperftools

Mohammad Jun 03 '12 at 10:09

source share

2 answers

Try setting LD_PRELOAD only

LD_PRELOAD = / usr / local / library / libprofiler.so

It looks like there are problems transferring a shared library that does not end with the suffix .so.

0

volpato May 14, '13 at 17:31

source share

osgx · Accepted Answer · 2014-06-01T18:28:28+0000

TL DR: foo - fast and small to receive profiling events, run it another 100 times. The frequency setting was misspelled, and pprof will not be counted more often than CONFIG_HZ (usually 250). It is better to upgrade to a more modern Linux perf profiler ( tutorial from its authors , wikipedia ).

Long version:

Your foo function is too short and simple - just call two functions. Compiled the test using g++ test.cc -lprofiler -o test.s -S -g filtering test.s using the c++filt program to make C ++ readable names:

 foo(): .LFB972: .loc 1 27 0 pushq %rbp movq %rsp, %rbp .loc 1 28 0 movl $.LC0, %esi movl std::cout, %edi call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) movl std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&), %esi movq %rax, %rdi call std::basic_ostream<char, std::char_traits<char> >::operator<<(std::basic_ostream<char, std::char_traits<char> >& (*)(std::basic_ostream<char, std::char_traits<char> >&)) .loc 1 29 0 popq %rbp ret .LFE972: .size foo(), .-foo()

So, to see it in your profile, you must run foo again, changing int a = 1000; mainly for something much more, for example 10,000 or better than 100,000 (like me for the test).

You can also fix the incorrect " CPUPROFILE_FREQUENC=10000 " to fix CPUPROFILE_FREQUENCY (note Y ). I have to say that 10000 is too large for CPUPROFILE_FREQUENCY, because usually it can only generate 1000 or 250 events per second depending on the configuration of the CONFIG_HZ kernel (most 3.x kernels have 250, check grep CONFIG_HZ= /boot/config* ) . And the default value for CPUPROFILE_FREQUENCY in pprof is 100.

I tested various CPUPROFILE_FREQUENCY values: 100000, 10000, 1000, 250 with a bash script on Ubuntu 14.04

 for a in 100000 100000 10000 10000 1000 1000 300 300 250 250; do echo -n "$a "; CPUPROFILE_FREQUENCY=$a CPUPROFILE=dump$a.txt ./test >/dev/null; done

And the results were the same 120-140 events and each lead time. / test is about 0.5 seconds, so the cpuprofiler from google-perftools cannot do more events per second for a single thread than CONFIG_HZ installed in the kernel (I have 250).

 100000 PROFILE: interrupts/evictions/bytes = 124/1/6584 100000 PROFILE: interrupts/evictions/bytes = 134/0/7864 10000 PROFILE: interrupts/evictions/bytes = 125/0/7488 10000 PROFILE: interrupts/evictions/bytes = 123/0/6960 1000 PROFILE: interrupts/evictions/bytes = 134/0/6264 1000 PROFILE: interrupts/evictions/bytes = 125/2/7272 300 PROFILE: interrupts/evictions/bytes = 137/2/7984 300 PROFILE: interrupts/evictions/bytes = 126/0/7216 250 PROFILE: interrupts/evictions/bytes = 123/3/6680 250 PROFILE: interrupts/evictions/bytes = 137/2/7352

With the original functions a = 1000, foo and cout is too fast to get any profiling event (even at 250 events / s) in each run, so you do not have foo , as well as no I / O functions. __write_nocancel can receive a selection event, and then foo and I / O functions of the form libstdc ++ will be reported (somewhere not at the very top, so use the --text option pprof or google-pprof ) with zero number of native events and non-zero number of child events:

  .... 1 0.9% 99.1% 1 0.9% __write_nocancel .... 0 0.0% 100.0% 1 0.9% _IO_new_file_overflow 0 0.0% 100.0% 1 0.9% _IO_new_file_write 0 0.0% 100.0% 1 0.9% __GI__IO_putc 0 0.0% 100.0% 1 0.9% foo 0 0.0% 100.0% 1 0.9% new_do_write 0 0.0% 100.0% 1 0.9% std::endl 0 0.0% 100.0% 1 0.9% std::ostream::put

With a=100000 foo is still too short and fast to receive its own events, but the I / O functions got a few. This is the list I selected from the long --text output:

  34 24.6% 24.6% 34 24.6% __write_nocancel 1 0.7% 95.7% 35 25.4% __GI__IO_fflush 1 0.7% 96.4% 1 0.7% __GI__IO_putc 1 0.7% 97.8% 2 1.4% std::operator<< 1 0.7% 98.6% 36 26.1% std::ostream::flush 1 0.7% 99.3% 2 1.4% std::ostream::put 0 0.0% 100.0% 34 24.6% _IO_new_file_sync 0 0.0% 100.0% 34 24.6% _IO_new_file_write 0 0.0% 100.0% 40 29.0% foo 0 0.0% 100.0% 34 24.6% new_do_write 0 0.0% 100.0% 2 1.4% std::endl

Functions with zero intrinsic counters, observable only due to the possibility of pprof reading call chains (he knows who calls the functions that received the sample if the information about the frame is not omitted).

I can also recommend more modern, more capable (both software and hardware events, frequency up to 5 kHz, etc., both user space and kernel profiling) and a more convenient profiler, Linux perf profiler (a tutorial from its authors , wikipedia ).

There are results from perf with a=10000 :

 $ perf record ./test3 >/dev/null ... skip some perf spam about inaccessibility of kernel symbols ... note the 3 kHz frequency here VVVV Lowering default frequency rate to 3250. Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.078 MB perf.data (~3386 samples) ]

To see the text report from the perf.data output file, I will use less (because perf report launches the interactive profile browser by default):

 $ perf report |less ... skip some extra info about the machine, kernel, and perf starting command # Samples: 1K of event 'cycles' # Event count (approx.): 1155264208 # Overhead Command Shared Object Symbol 41.94% test3 libm-2.19.so [.] __tan_sse2 16.95% test3 libm-2.19.so [.] __sin_sse2 13.40% test3 libm-2.19.so [.] __cos_sse2 4.93% test3 test3 [.] bar() 2.90% test3 libc-2.19.so [.] __GI___libc_write .... 0.20% test3 test3 [.] foo()

Or perf report -n | less perf report -n | less to see the counters of raw events (samples):

 # Overhead Samples Command Shared Object 41.94% 663 test3 libm-2.19.so [.] __tan_sse2 16.95% 268 test3 libm-2.19.so [.] __sin_sse2 13.40% 212 test3 libm-2.19.so [.] __cos_sse2 4.93% 78 test3 test3 [.] bar() 2.90% 62 test3 libc-2.19.so [.] __GI___libc_write .... 0.20% 4 test3 test3 [.] foo()

How to use google perf tools

More articles: