TL DR: foo
- fast and small to receive profiling events, run it another 100 times. The frequency setting was misspelled, and pprof
will not be counted more often than CONFIG_HZ (usually 250). It is better to upgrade to a more modern Linux perf
profiler ( tutorial from its authors , wikipedia ).
Long version:
Your foo
function is too short and simple - just call two functions. Compiled the test using g++ test.cc -lprofiler -o test.s -S -g
filtering test.s
using the c++filt
program to make C ++ readable names:
foo(): .LFB972: .loc 1 27 0 pushq %rbp movq %rsp, %rbp .loc 1 28 0 movl $.LC0, %esi movl std::cout, %edi call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) movl std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&), %esi movq %rax, %rdi call std::basic_ostream<char, std::char_traits<char> >::operator<<(std::basic_ostream<char, std::char_traits<char> >& (*)(std::basic_ostream<char, std::char_traits<char> >&)) .loc 1 29 0 popq %rbp ret .LFE972: .size foo(), .-foo()
So, to see it in your profile, you must run foo
again, changing int a = 1000;
mainly for something much more, for example 10,000 or better than 100,000 (like me for the test).
You can also fix the incorrect " CPUPROFILE_FREQUENC=10000
" to fix CPUPROFILE_FREQUENCY
(note Y
). I have to say that 10000 is too large for CPUPROFILE_FREQUENCY, because usually it can only generate 1000 or 250 events per second depending on the configuration of the CONFIG_HZ
kernel (most 3.x kernels have 250, check grep CONFIG_HZ= /boot/config*
) . And the default value for CPUPROFILE_FREQUENCY in pprof is 100.
I tested various CPUPROFILE_FREQUENCY values: 100000, 10000, 1000, 250 with a bash script on Ubuntu 14.04
for a in 100000 100000 10000 10000 1000 1000 300 300 250 250; do echo -n "$a "; CPUPROFILE_FREQUENCY=$a CPUPROFILE=dump$a.txt ./test >/dev/null; done
And the results were the same 120-140 events and each lead time. / test is about 0.5 seconds, so the cpuprofiler from google-perftools cannot do more events per second for a single thread than CONFIG_HZ installed in the kernel (I have 250).
100000 PROFILE: interrupts/evictions/bytes = 124/1/6584 100000 PROFILE: interrupts/evictions/bytes = 134/0/7864 10000 PROFILE: interrupts/evictions/bytes = 125/0/7488 10000 PROFILE: interrupts/evictions/bytes = 123/0/6960 1000 PROFILE: interrupts/evictions/bytes = 134/0/6264 1000 PROFILE: interrupts/evictions/bytes = 125/2/7272 300 PROFILE: interrupts/evictions/bytes = 137/2/7984 300 PROFILE: interrupts/evictions/bytes = 126/0/7216 250 PROFILE: interrupts/evictions/bytes = 123/3/6680 250 PROFILE: interrupts/evictions/bytes = 137/2/7352
With the original functions a = 1000, foo
and cout is too fast to get any profiling event (even at 250 events / s) in each run, so you do not have foo
, as well as no I / O functions. __write_nocancel
can receive a selection event, and then foo
and I / O functions of the form libstdc ++ will be reported (somewhere not at the very top, so use the --text
option pprof
or google-pprof
) with zero number of native events and non-zero number of child events:
.... 1 0.9% 99.1% 1 0.9% __write_nocancel .... 0 0.0% 100.0% 1 0.9% _IO_new_file_overflow 0 0.0% 100.0% 1 0.9% _IO_new_file_write 0 0.0% 100.0% 1 0.9% __GI__IO_putc 0 0.0% 100.0% 1 0.9% foo 0 0.0% 100.0% 1 0.9% new_do_write 0 0.0% 100.0% 1 0.9% std::endl 0 0.0% 100.0% 1 0.9% std::ostream::put
With a=100000
foo is still too short and fast to receive its own events, but the I / O functions got a few. This is the list I selected from the long --text
output:
34 24.6% 24.6% 34 24.6% __write_nocancel 1 0.7% 95.7% 35 25.4% __GI__IO_fflush 1 0.7% 96.4% 1 0.7% __GI__IO_putc 1 0.7% 97.8% 2 1.4% std::operator<< 1 0.7% 98.6% 36 26.1% std::ostream::flush 1 0.7% 99.3% 2 1.4% std::ostream::put 0 0.0% 100.0% 34 24.6% _IO_new_file_sync 0 0.0% 100.0% 34 24.6% _IO_new_file_write 0 0.0% 100.0% 40 29.0% foo 0 0.0% 100.0% 34 24.6% new_do_write 0 0.0% 100.0% 2 1.4% std::endl
Functions with zero intrinsic counters, observable only due to the possibility of pprof
reading call chains (he knows who calls the functions that received the sample if the information about the frame is not omitted).
I can also recommend more modern, more capable (both software and hardware events, frequency up to 5 kHz, etc., both user space and kernel profiling) and a more convenient profiler, Linux perf
profiler (a tutorial from its authors , wikipedia ).
There are results from perf
with a=10000
:
$ perf record ./test3 >/dev/null ... skip some perf spam about inaccessibility of kernel symbols ... note the 3 kHz frequency here VVVV Lowering default frequency rate to 3250. Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.078 MB perf.data (~3386 samples) ]
To see the text report from the perf.data
output file, I will use less (because perf report
launches the interactive profile browser by default):
$ perf report |less ... skip some extra info about the machine, kernel, and perf starting command
Or perf report -n | less
perf report -n | less
to see the counters of raw events (samples):
# Overhead Samples Command Shared Object 41.94% 663 test3 libm-2.19.so [.] __tan_sse2 16.95% 268 test3 libm-2.19.so [.] __sin_sse2 13.40% 212 test3 libm-2.19.so [.] __cos_sse2 4.93% 78 test3 test3 [.] bar() 2.90% 62 test3 libc-2.19.so [.] __GI___libc_write .... 0.20% 4 test3 test3 [.] foo()