Edit: there is a better answer from another SO Q & A here . However, in an assembly, AFAIK uses a counter such as SysTick - the only way to guarantee any semblance of cycle accuracy.
Edit 2: To avoid counter overflow, which will lead to a very long delay, clear the SysTick counter before use, i.e. SysTick->VAL = 0;
Original:
Cortex-Ms has a built-in SysTick timer that can be used for accurate timing of the cycle.
Start the timer first:
SysTick->CTRL = SysTick_CTRL_CLKSOURCE_Msk | SysTick_CTRL_ENABLE_Msk;
You can then read the current account using the VAL register. Then you can implement tactical loop delay as follows:
int count = SysTick->VAL; while(SysTick->VAL < (count+30));
Please note that this will lead to some overhead due to load, comparison, and branching in the loop, so the final loop will be slightly smaller, no more than a few ticks in my estimation.
source share