How to get call graph profiling using gcc compiled code and ARM Cortex A8 target?

Question

How to get call graph profiling using gcc compiled code and ARM Cortex A8 target?

I bite my teeth on this ...

I need to do profiling on the ARM board and you need to view call schedules. I tried using OProfile, Kernel perf and Google productivity tools. All work fine, but do not display any call schedule information.

This led me to conclude that I was not compiling my code correctly.

When compiling C ++ code, I use the following flags:

Specifically for Arch:

-march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=vfpv3

General:

 -fexceptions -fno-strict-aliasing -D_REENTRANT -Wall -Wextra

Debugging (with optimization):

 -O2 -g -fno-omit-frame-pointer

I searched a lot on Google and found some related topics:

libunwind?
dwarf
(asynchronous-) entertainment tables
-mapcs-frame

However, I do not quite understand how all this is connected. Any tips on how to make call schedules work?

Note (due to Rian's answer): I am interested to know if and why some methods take longer (in relation to others) on ARM than x86-64. This will not help to do it on another platform (although my code is compiled for both, and for graphs on x86-64).

+4

gcc stack-trace arm cortex-a8 oprofile

Hanno S. Nov 30 '11 at 17:22

source share

1 answer

Ian sanderson · Answer 1 · 2011-11-30T17:32:49+0000

I know that you want to perform profiling on ARM cortex-A8, but if you are interested in call schedules, why not compile for x86 and run the valgrind valgrind tool and check the results with kcachegrind?

The call graphs must be the same between the two architectures, even if they compile functions slightly differently, the relationship between the functions should not change.

No special flags are required:

 valgrind --tool=callgrind -v --dump-every-bb=10000000 ./some-app kcachegrind &

How to get call graph profiling using gcc compiled code and ARM Cortex A8 target?

More articles: