According to others, these are elements that are external to the CPU core itself, it can be a drum, it can be a peripheral device with memory display (the uart status register allows you to talk or the timer register, etc.).
#define SOME_STATUS_REGA (*((volatile unsigned int *)0x10008000)) void fun ( void ) { while(SOME_STATUS_REGA==0) continue; }
for one purpose and toolchain produces
00000000 <fun>: 0: e59f200c ldr r2, [pc, #12] ; 14 <fun+0x14> 4: e5923000 ldr r3, [r2] 8: e3530000 cmp r3, #0 c: 0afffffc beq 4 <fun+0x4> 10: e12fff1e bx lr 14: 10008000 andne r8, r0, r0 00000018 <more_fun>: 18: e59f300c ldr r3, [pc, #12] ; 2c <more_fun+0x14> 1c: e5933000 ldr r3, [r3] 20: e3530000 cmp r3, #0 24: 112fff1e bxne lr 28: eafffffe b 28 <more_fun+0x10> 2c: 10008000 andne r8, r0, r0
you can see with more_fun, and not a volatile case, when it reads a location, at one time does the comparison once, but goes into an infinite loop. The compiler did what we told him because there is no way to change the variable. There is no reason to write the clock cycles, re-reading something that will not change, therefore, if it was not zero, and only reading will never be zero, it falls into an infinite loop.
If you make it volatile, you “ask” the compiler to read or write each time your code accesses it. Which you can see in a funny case, it returns every time through a loop to read this address to see if it has changed. The volatile keyword is what distinguishes these two behaviors.
It should not be hardware that changes these values, if you use a global variable for communication between the isr code and the foreground, then this variable in memory can be changed using isr and / or the foreground code, so both must relate to it as volatile.
You are also dealing with a multi-core / multi-threaded processor, where each core / thread independently has access to shared resources. Not only do you need to use volatility in this situation, but you may need to not cache this ram if the kernels do not use the same cache and may need to have hardware and / or software locks if atomic operations are needed (ldrex / strex in the ARM world is the first step for this).
EDIT
Another demonstration, the problem is not only in readings, but also in writing. Suppose you have a peripheral device for which you need to write a configuration register in order to configure some mode, then you write it again to enable it in this mode. or you have a hardware interface where each record increments a logical pointer and you make a series of records to do something.
#define SOMETHING1 (*((volatile unsigned char *)0x10002000)) void fun ( void ) { SOMETHING1=5; SOMETHING1=5; SOMETHING1=6; } #define SOMETHING2 (*((unsigned char *)0x10002000)) void more_fun ( void ) { SOMETHING2=5; SOMETHING2=5; SOMETHING2=6; }
without the variability that the peripheral device will not work properly. Multiple entries on the same pointer / address are considered dead code and optimized.
00000000 <fun>: 0: e3a02005 mov r2, #5 4: e3a01006 mov r1, #6 8: e59f300c ldr r3, [pc, #12] ; 1c <fun+0x1c> c: e5c32000 strb r2, [r3] 10: e5c32000 strb r2, [r3] 14: e5c31000 strb r1, [r3] 18: e12fff1e bx lr 1c: 10002000 andne r2, r0, r0 00000020 <more_fun>: 20: e3a02006 mov r2, #6 24: e59f3004 ldr r3, [pc, #4] ; 30 <more_fun+0x10> 28: e5c32000 strb r2, [r3] 2c: e12fff1e bx lr 30: 10002000 andne r2, r0, r0
EDIT2
Clang / llvm also demonstrates the problem
Production
00000000 <afun>: 0: e3a00a02 mov r0, #8192 ; 0x2000 4: e3a01004 mov r1, #4 8: e3800201 orr r0, r0, #268435456 ; 0x10000000 c: e5c01000 strb r1, [r0] 10: e3a01005 mov r1, #5 14: e5c01000 strb r1, [r0] 18: e3a01006 mov r1, #6 1c: e5c01000 strb r1, [r0] 20: e5d01000 ldrb r1, [r0] 24: e3811001 orr r1, r1, #1 28: e5c01000 strb r1, [r0] 2c: e5d01000 ldrb r1, [r0] 30: e3510000 cmp r1, #0 34: 0afffffc beq 2c <afun+0x2c> 38: e12fff1e bx lr 0000003c <bfun>: 3c: e3a00a02 mov r0, #8192 ; 0x2000 40: e3a01007 mov r1, #7 44: e3800201 orr r0, r0, #268435456 ; 0x10000000 48: e5c01000 strb r1, [r0] 4c: e12fff1e bx lr
Adding mutable will not hurt you if you do things that are not in a domain that can optimize them. (one record in each register in a certain sequence, one read of the register, single, also implying the absence of cycles). This will definitely hurt you if you make more than one record (which often happens when setting up a peripheral device) that is reading, change the record (x | = something, y & = something, z ^ = something, etc.).
If you use a toolchain that does not have an optimizer, or you do not want to optimize, you will not have this problem, but this code does not carry over if you leave the bats, you will end up in trouble if you don’t do it at all with variables / code that traverses compilation or other similar domains (hardware is a separate compiled domain from software).