CPU instruction cycle time

  • I assume that the __no_operation() instruction for internal (ARM) should be executed 1 / (168 MHz) to execute, provided that each NOP is executed in one clock cycle, which I would like to check with the documentation,

  • Is there a standard location for processor cycle runtime information? I am trying to determine how much time the STM32f407IGh6 processor should take to execute a NOP instruction running at 168 MHz.

  • Some processors require several oscillations for each instruction cycle, and some - from 1 to 1 when comparing clock cycles with instruction cycles.

  • The term "instruction loop" is not even present in the entire data table provided by STMicro, and in their programming manual (indicating the set of processor instructions, by the way). However, the 8051 documentation clearly defines the execution time of a command cycle in addition to its machine cycle characteristics.

+4
source share
3 answers

All commands require more than one measure. Extract, decode, execute. If you work on stm32, you probably take a few hours to sample only because of the slowness of the final exam, if it runs on ram, which knows if it is 168 MHz or slower. Shoulders usually require several clock cycles.

No one else talks about learning cycles because they are not deterministic. The answer is always "dependent."

It may take X hours to build one car, but if you start building a car, after 30 seconds, start building another one every 30 seconds, and then after X hours you will have a new car every 30 seconds. Does this mean that it takes 30 seconds to get the car? Of course not. But this means that after starting you can average a new car every 30 seconds on this production line.

This is how processors work, it takes several hours to execute each command, but you are a conveyor topic, so many of them are in the pipe at the same time, so the average value is such that the kernel, if you feed the correct instructions, can execute these instructions in one hour. per beat. With fork and slow memory / rum, you don't even expect to get this.

if you want to do an experiment on your processor then create a loop with several hundred nights

 beg = read time load r0 = 100000 top: nop nop nop nop nop nop ... nop nop nop r0 = r0 - 1 bne top end = read timer 

If it takes a split second to complete this cycle, either either make the number of nops larger, or run an order of magnitude more cycles. In fact, you want to hit a significant number of timer ticks, not necessarily seconds or minutes on a wall clock, but something in terms of a good number of timer ticks.

Then do the math and calculate the average.

Repeat the experiment with the program sitting in ram instead of rom

Slow down the processor clock speed to the fastest one, which does not require a flash divider, repeat the work from the flash.

being Cortec-m4, will turn on cache I, repeat the use of flash, repeat the use of ram (At 168Mhz).

If you did not get a number of different results from all of these experiments using the same test cycle, you are probably doing something wrong.

+3
source

If you carefully configure all of your clocks in Reset and Clock Control (RCT), and know all the clocks, you can accurately calculate the command execution time for most instructions and have at least the worst rating for all of them. For example, I use the stm32f439Zi processor, which is cortex-m4 compatible with stm32f407. If you look at the reference guide, the clock tree will show you the PLL and all the bus pre-dividers. In my case, I have external 8 MHz clocks with PLL configured to provide the 84 MHz SYSCLK system clock. This means that one processor cycle is 1.0 / 84e6 ~ 12 ns.

To compare the number of cycles or SYSCLK taken in a single instruction, you use the ARM® Cortex®-M4 Technical Technical Manual . For example, a MOV instruction in most cases takes a loop. The ADD instruction in most cases takes a loop, which means that after 12 ns you have the result of the addition, stored in the register and ready for use by another operation.

You can use this information to schedule CPU resources in many cases, such as intermittent interruptions, for example, and electrical and low-level software developers talk about it and do it when it comes to strict real-time and security. Typically, engineers work with the worst-case runtimes when designing, ignoring piping in order to have a fast and rough internal processor load. During implementation, you use tools to accurately analyze time and refine software.

In the process of designing and implementing non-deterministic things are reduced to insignificant.

+2
source

The number of clock cycles per instruction matters.

On avr, its (usually) 1 instruction / clock, so 12Mhz AVR works at a speed of about 12 mips

A PIC typically uses 1 command / 4 measures, so the PIC 12Mhz works for about 3 miles

At 8051 (orig) its 1 instruction / 12 hours, so 12Mhz 8051 operates at a speed of about 1 mile

To find out how much you can do, the instructions / hours are up to date. This is why an AMD processor can get more features / Mhz than an Intel processor.

+2
source

Source: https://habr.com/ru/post/1496930/


All Articles