All commands require more than one measure. Extract, decode, execute. If you work on stm32, you probably take a few hours to sample only because of the slowness of the final exam, if it runs on ram, which knows if it is 168 MHz or slower. Shoulders usually require several clock cycles.
No one else talks about learning cycles because they are not deterministic. The answer is always "dependent."
It may take X hours to build one car, but if you start building a car, after 30 seconds, start building another one every 30 seconds, and then after X hours you will have a new car every 30 seconds. Does this mean that it takes 30 seconds to get the car? Of course not. But this means that after starting you can average a new car every 30 seconds on this production line.
This is how processors work, it takes several hours to execute each command, but you are a conveyor topic, so many of them are in the pipe at the same time, so the average value is such that the kernel, if you feed the correct instructions, can execute these instructions in one hour. per beat. With fork and slow memory / rum, you don't even expect to get this.
if you want to do an experiment on your processor then create a loop with several hundred nights
beg = read time load r0 = 100000 top: nop nop nop nop nop nop ... nop nop nop r0 = r0 - 1 bne top end = read timer
If it takes a split second to complete this cycle, either either make the number of nops larger, or run an order of magnitude more cycles. In fact, you want to hit a significant number of timer ticks, not necessarily seconds or minutes on a wall clock, but something in terms of a good number of timer ticks.
Then do the math and calculate the average.
Repeat the experiment with the program sitting in ram instead of rom
Slow down the processor clock speed to the fastest one, which does not require a flash divider, repeat the work from the flash.
being Cortec-m4, will turn on cache I, repeat the use of flash, repeat the use of ram (At 168Mhz).
If you did not get a number of different results from all of these experiments using the same test cycle, you are probably doing something wrong.