How to delay ARM Cortex M0 + for n cycles without a timer?

I want to delay ARM Cortex M0 + for n cycles without using a timer with the smallest possible code size. (I think this requires the use of assembly.)

Delaying 0 loops is simple - no code. 1 cycle delay is one NOP. A 2-cycle delay is two NOPs.

At what point (code size) is it effective to start the loop?

How many cycles does the maximum compressed cycle take? What is the setup time?


Notes for answering questions:

The following C code:

register unsigned char counter = 100; while (counter-- > 0) { asm(""); } 

when compiled with gcc and -O3 gives:

  mov r3, #100 .L5: sub r3, r3, #1 uxtb r3, r3 cmp r3, #0 bne .L5 

This either illustrates that there is still a goal in hand-encoded ARM assembly, or (which is much more likely) that the C code above is not the best way to pass what I want to do to the compiler.

Comments?

+5
source share
2 answers

The code will depend on what exactly n is and whether it should be dynamically variable, but subject to the timeouts of the M0 + command, setting boundaries for a particular procedure is quite simple.

For the smallest possible (6-byte) full cycle with a fixed 8-bit direct counter:

  movs r0, #NUM ;1 cycle 1: subs r0, r0, #1 ;1 cycle bne 1b ;2 if taken, 1 otherwise 

with NUM=1 we get at least 3 cycles plus 3 cycles for each additional cycle up to NUM=255 in 765 cycles (of course, you could have 2 ^ 32 iterations from NUM=0 , but that seems a bit silly). This makes the lower limit for the cycle practical for about 6 cycles. Thanks to the fixed loop, it is easy to insert a NOP (or even nested loops) inside it to lengthen each iteration and before / after to align it to a multiple of the loop length. If you can arrange several iterations to be ready in the register before you need to wait, you can lose the initial mov and have almost any number of three or more loops, minus one. If you require a one-time resolution for the variable delay, the initial installation cost will be slightly higher to fix the rest (the calculated branch in the NOP sled is what I would do for this)

I assume that if you are at a critical point in the cycle time, you already have interrupts (otherwise in another cycle somewhere on the CPSID ), and that you do not have a status bus wait, adding additional cycles to the instruction selections.

As for trying to do this in C: the fact that you need to hack an empty asm to “optimize” the “useless” loop is a hint. An abstract C machine does not have the concept of “instructions” or “loops,” so it is simply impossible to express it in a language. Trying to rely on specific C constructs to compile suitable instructions is extremely fragile - change the compiler flag; update the compiler; change some remote code that affects the distribution of registers, which affects the choice of command; etc. - almost everything can unexpectedly change the generated code, so I would say that a manual-encoded assembly is the only reasonable approach for code with a clear loop.

+8
source

The shortest ARM loop I can think of is as follows:

 mov r0, #COUNT L: subs r0, r0, #1 bnz L 

Since I do not have the device in question, I do not know about the timing. They depend on the kernel.

+3
source

Source: https://habr.com/ru/post/1209258/


All Articles