CPU utilization measurement on a metal system

I am working on an ARM cortex M4 evaluation board, its bare metal application, without using any operating system.

Now I want to measure the CPU given process / algorithm, what would be the best way to do this?

Should I implement an operating system to measure the use of a CPU that has functionality for this demand?

+4
source share
3 answers

The question almost answers itself. What application of your bare metal performs when it is not in this process / algorithm? Measure one or the other or both. If you have a bare metal application that does not completely consume a processor in this algorithm, then you already have an operating system to the extent that you control this application / function time. You can use several methods from a simple counter in a cycle relative to a timer to see how many counts per cycle when the algorithm receives time fragments versus none. You can just run the algorithm itself, etc.

I assume that when you say CPU, you mean the whole system, since your performance is highly dependent on both your code and what it is talking to. If you use a flash on Corex-m4 depending on the clock frequency, you can burn processor cycles simply by waiting for instructions or data (and it can very easily get a wrong idea of ​​processor performance for an algorithm when it is not a clock-burning algorithm). Caches mask / control this performance and can greatly affect performance if you are not careful and do not know what they are doing. Being a question in C ++, your compiler plays a big role in performance, and also in your code, of course, it can very easily make the code several times faster or slower with minimal changes in the command line or code.

If the algorithm is part of isr, then the processor goes into sleep mode otherwise, you can use the gpio pin and the area technique to get an idea of ​​the ratio of moves and sleep.

+5
source

Implementing an OS to measure processor downtime seems a bit more complicated to me. As far as I know, the Cortex-M4 includes a debugging unit (DWT) that allows you to take a snapshot of the loop counter . But the easiest way would be to connect the pin to the oscilloscope and switch it to the input and output of your algorithm.

+4
source

Firstly, the implementation of the entire operating system will not be practical or even possible with the goal of only measuring performance. One possible way is to save the variable count, which will record the number of ticks that occurred before this duration. And increase this variable in timer interrupt.

+1
source

Source: https://habr.com/ru/post/1491326/


All Articles