Carefully evaluate what you mean by "profiling." You really work very close to bare metals, and you probably need to take on some of the work done by the gprof tool.
Do you want to call a function call? or ISR? How about switching the GPIO line when entering and exiting the code being checked. A data logger or oscilloscope can be configured to trigger these events. (In my experience, the data logger is more convenient since mine can be configured to capture the sequence of these events, which allows me to calculate the average timings.)
Do you want to count the number of executions? The Cortex A8 is equipped with many features (such as custom event counters) that may help: link . Your ARM chip may be equipped with other peripherals that may be used as well (depending on the vendor). Regardless, look at the link above - the new ARM has many interesting functions that I can not play with as much as I would like !; -)
source share