GCC generates different code depending on the value of the array index

This code (hand):

void blinkRed(void) { for(;;) { bb[0x0008646B] ^= 1; sys.Delay_ms(14); } } 

... compiled into asm code:

 08000470: ldr r4, [pc, #20] ; (0x8000488 <blinkRed()+24>) // r4 = 0x422191ac 08000472: ldr r6, [pc, #24] ; (0x800048c <blinkRed()+28>) 08000474: movs r5, #14 08000476: ldr r3, [r4, #0] 08000478: eor.w r3, r3, #1 0800047c: str r3, [r4, #0] 0800047e: mov r0, r6 08000480: mov r1, r5 08000482: bl 0x80001ac <CSTM32F100C6::Delay_ms(unsigned int)> 08000486: bn 0x8000476 <blinkRed()+6> 

This is normal.

But if I just changed the index of the array ( -0x400 ) ....

 void blinkRed(void) { for(;;) { bb[0x0008606B] ^= 1; sys.Delay_ms(14); } } 

... I do not have optimized code:

 08000470: ldr r4, [pc, #24] ; (0x800048c <blinkRed()+28>) // r4 = 0x42218000 08000472: ldr r6, [pc, #28] ; (0x8000490 <blinkRed()+32>) 08000474: movs r5, #14 08000476: ldr.w r3, [r4, #428] ; 0x1ac 0800047a: eor.w r3, r3, #1 0800047e: str.w r3, [r4, #428] ; 0x1ac 08000482: mov r0, r6 08000484: mov r1, r5 08000486: bl 0x80001ac <CSTM32F100C6::Delay_ms(unsigned int)> 0800048a: bn 0x8000476 <blinkRed()+6> 

The difference is that in the first case, r4 immediately loaded with the destination address ( 0x422191ac ), and then the memory is accessed using double-byte instructions, but in the second case, r4 loaded with some intermediate address ( 0x42218000 ), and then the memory is accessed with instructions 4 bytes with an offset ( +0x1ac ) to the destination address ( 0x422181ac ).

Why does the compiler do this?

I use: arm-none-eabi-g++ -mcpu=cortex-m3 -mthumb -g2 -Wall -O1 -std=gnu++14 -fno-exceptions -fno-use-cxa-atexit -fstrict-volatile-bitfields -c -DSTM32F100C6T6B -DSTM32F10X_LD_VL

bb :

 __attribute__ ((section(".bitband"))) volatile u32 bb[0x00800000]; 

In .ld it is defined as: in the MEMORY section:

 BITBAND(rwx): ORIGIN = 0x42000000, LENGTH = 0x02000000 

in the SECTIONS section:

 .bitband (NOLOAD) : SUBALIGN(0x02000000) { KEEP(*(.bitband)) } > BITBAND 
+6
source share
1 answer

I would consider this as an artifact / missing optimization opportunity -O1.

This can be understood in more detail if we look at the code generated with -O- to load bb[...] :

The first case:

 movw r2, #:lower16:bb movt r2, #:upper16:bb movw r3, #37292 movt r3, 33 adds r3, r2, r3 ldr r3, [r3, #0] 

Second case:

 movw r3, #:lower16:bb movt r3, #:upper16:bb add r3, r3, #2195456 ; 0x218000 = 4*0x86000 add r3, r3, #428 ldr r3, [r3, #0] 

The code in the second case is better, and this can be done like this because you can add a constant with two add instructions (this is not the case if the index is 0x0008646B).

-O1 performs only optimizations that do not require much time. Thus, it is obvious that it combines the early add and ldr, so it misses the opportunity later to download the entire address with one relative lcr pc.

Compile with -O2 (or -fgcse) and the code looks as expected.

+1
source

Source: https://habr.com/ru/post/986924/


All Articles