Unstable under the hood

I would like to help you to better understand part of the following passage:
"The volatile keyword qualifier indicates that a variable can be changed outside the program. For example, an external device can write data to a port. Compilers sometimes use a cache or register to store a value in a memory location for optimization. If an external record changes the memory location, this change does not will be reflected in the cached or register value. " (It comes from the book: Understanding and Using c-Pointers, pp. 178-179)

I have an ambiguity between these phrases: "save the value in a memory location" and "If the external record changes the memory location".

My problem: it seems that if an external device writes data to the port, this data will be stored in some place (???), then it will be saved in the register / cache (??) and then inside the source language variable c . Something misunderstood me. From what I know, there should be a normal workflow: an external device → a small temporary buffer → a variable in RAM, (when data goes from the gadget to the MCU RAM)

#define PORT 0xB0000000 unsigned int volatile * const port = (unsigned int*) PORT; *port = 0x0BF4; // write to port value = *port; // read from port 
+5
source share
3 answers

memory-mapped I / O devices do not go through the registers of the central processor (or, cache, usually). That's why they are external, they just hang somewhere on the memory bus, pretending to be memory.

The values ​​of such a device will be displayed directly in what (for the CPU) looks like a memory.

In the example below:

 *port = 0x0BF4; // write to port 

can cause the A / D converter to start the conversion, and this

 value = *port; // read from port 

can read the received value. This is not a very typical design (analog-to-digital converters are usually somewhat more complicated, etc.), but it is possible.

If the compiler thought “hey there is only reading from where this value was written”, it can replace two statements

 value = 0x0BF4; // "optimized", but broken since no more I/O occurs 

It would ruin your day if you tried to read the values ​​of this ADC.

The volatile location declaration tells the compiler not to make any assumptions about the side effects of location access.

If you look at something like an STM32F4 ARM microcontroller, it has tons of memory I / O (serial ports, a USB controller, Ethernet, timers, A / D and D / A converters, ... they're all there) ") plus a lot of internal (mostly, but still memory related) things.

+5
source

According to others, these are elements that are external to the CPU core itself, it can be a drum, it can be a peripheral device with memory display (the uart status register allows you to talk or the timer register, etc.).

 #define SOME_STATUS_REGA (*((volatile unsigned int *)0x10008000)) void fun ( void ) { while(SOME_STATUS_REGA==0) continue; } #define SOME_STATUS_REGB (*((unsigned int *)0x10008000)) void more_fun ( void ) { while(SOME_STATUS_REGB==0) continue; } 

for one purpose and toolchain produces

 00000000 <fun>: 0: e59f200c ldr r2, [pc, #12] ; 14 <fun+0x14> 4: e5923000 ldr r3, [r2] 8: e3530000 cmp r3, #0 c: 0afffffc beq 4 <fun+0x4> 10: e12fff1e bx lr 14: 10008000 andne r8, r0, r0 00000018 <more_fun>: 18: e59f300c ldr r3, [pc, #12] ; 2c <more_fun+0x14> 1c: e5933000 ldr r3, [r3] 20: e3530000 cmp r3, #0 24: 112fff1e bxne lr 28: eafffffe b 28 <more_fun+0x10> 2c: 10008000 andne r8, r0, r0 

you can see with more_fun, and not a volatile case, when it reads a location, at one time does the comparison once, but goes into an infinite loop. The compiler did what we told him because there is no way to change the variable. There is no reason to write the clock cycles, re-reading something that will not change, therefore, if it was not zero, and only reading will never be zero, it falls into an infinite loop.

If you make it volatile, you “ask” the compiler to read or write each time your code accesses it. Which you can see in a funny case, it returns every time through a loop to read this address to see if it has changed. The volatile keyword is what distinguishes these two behaviors.

It should not be hardware that changes these values, if you use a global variable for communication between the isr code and the foreground, then this variable in memory can be changed using isr and / or the foreground code, so both must relate to it as volatile.

You are also dealing with a multi-core / multi-threaded processor, where each core / thread independently has access to shared resources. Not only do you need to use volatility in this situation, but you may need to not cache this ram if the kernels do not use the same cache and may need to have hardware and / or software locks if atomic operations are needed (ldrex / strex in the ARM world is the first step for this).

EDIT

Another demonstration, the problem is not only in readings, but also in writing. Suppose you have a peripheral device for which you need to write a configuration register in order to configure some mode, then you write it again to enable it in this mode. or you have a hardware interface where each record increments a logical pointer and you make a series of records to do something.

 #define SOMETHING1 (*((volatile unsigned char *)0x10002000)) void fun ( void ) { SOMETHING1=5; SOMETHING1=5; SOMETHING1=6; } #define SOMETHING2 (*((unsigned char *)0x10002000)) void more_fun ( void ) { SOMETHING2=5; SOMETHING2=5; SOMETHING2=6; } 

without the variability that the peripheral device will not work properly. Multiple entries on the same pointer / address are considered dead code and optimized.

 00000000 <fun>: 0: e3a02005 mov r2, #5 4: e3a01006 mov r1, #6 8: e59f300c ldr r3, [pc, #12] ; 1c <fun+0x1c> c: e5c32000 strb r2, [r3] 10: e5c32000 strb r2, [r3] 14: e5c31000 strb r1, [r3] 18: e12fff1e bx lr 1c: 10002000 andne r2, r0, r0 00000020 <more_fun>: 20: e3a02006 mov r2, #6 24: e59f3004 ldr r3, [pc, #4] ; 30 <more_fun+0x10> 28: e5c32000 strb r2, [r3] 2c: e12fff1e bx lr 30: 10002000 andne r2, r0, r0 

EDIT2

Clang / llvm also demonstrates the problem

 #define A (*((volatile unsigned char *)0x10002000)) void afun ( void ) { A = 4; A = 5; A = 6; A |= 1; while(A==0) continue; } #define B (*((unsigned char *)0x10002000)) void bfun ( void ) { B = 4; B = 5; B = 6; B |= 1; while(B==0) continue; } 

Production

 00000000 <afun>: 0: e3a00a02 mov r0, #8192 ; 0x2000 4: e3a01004 mov r1, #4 8: e3800201 orr r0, r0, #268435456 ; 0x10000000 c: e5c01000 strb r1, [r0] 10: e3a01005 mov r1, #5 14: e5c01000 strb r1, [r0] 18: e3a01006 mov r1, #6 1c: e5c01000 strb r1, [r0] 20: e5d01000 ldrb r1, [r0] 24: e3811001 orr r1, r1, #1 28: e5c01000 strb r1, [r0] 2c: e5d01000 ldrb r1, [r0] 30: e3510000 cmp r1, #0 34: 0afffffc beq 2c <afun+0x2c> 38: e12fff1e bx lr 0000003c <bfun>: 3c: e3a00a02 mov r0, #8192 ; 0x2000 40: e3a01007 mov r1, #7 44: e3800201 orr r0, r0, #268435456 ; 0x10000000 48: e5c01000 strb r1, [r0] 4c: e12fff1e bx lr 

Adding mutable will not hurt you if you do things that are not in a domain that can optimize them. (one record in each register in a certain sequence, one read of the register, single, also implying the absence of cycles). This will definitely hurt you if you make more than one record (which often happens when setting up a peripheral device) that is reading, change the record (x | = something, y & = something, z ^ = something, etc.).

If you use a toolchain that does not have an optimizer, or you do not want to optimize, you will not have this problem, but this code does not carry over if you leave the bats, you will end up in trouble if you don’t do it at all with variables / code that traverses compilation or other similar domains (hardware is a separate compiled domain from software).

+4
source

Before C added a "volatile" keyboard, every access to an object that did not have a register qualifier would result in loading or saving to the object's address. Given declarations int i,j; , code:

 i+=j; j+=i; i+=j; 

load i and j from memory, add them and save the result to i . Then it will load i and j from memory again, add them and save the result to j . Finally, the third time it loads i and j from memory, adds them and saves the result to i . Thus, three statements will result in six loads, three additions and three stores.

If there is nothing special about i and j , something like the following will be more efficient:

 register int t1,t2; t1=i; t2=j; t1+=t2; t2+=t1; t1+=t2; i=t1; j=t2; 

Although this looks like more code, operations with t1 and t2 do not require loading and storage. Thus, the compiler will have to generate two loads, three additions and two stores - saving the cost of four loads and storage compared to the original.

Having a compiler automatically turns the previous kind of code into the last, it would be useful, except for one problem: sometimes things that look like variables can be changed in ways that the compiler does not know about. This can happen either because an electrical circuit other than memory is connected to the memory bus (many systems have I / O devices that are connected to respond when the code tries to read or write certain addresses) or because the machine can respond to external irritants, by scheduling, manage a special sector of code called an interrupt handler, and then resume what it did when the interrupt handler returns. Interrupt handlers often read and write variables that the main code can also access (indeed, one of the reasons they exist), but if the code does something like:

 while(!data_received) ; 

and relies on setting the data_received interrupt handler as soon as the data becomes available, such code may fail if the compiler replaces it:

 t1 = data_received; while(t1) ; 

which will execute the loop “faster”, but will not be able to exit the loop when receiving data.

The goal of volatile is to tell the compiler that certain objects require special handling. Some compilers (reasonable, IMHO) interpret volatile as an indication that access to an object marked in this way can arbitrarily affect everything in the system in ways that the compiler does not know about, which allows you to create constructs such as:

 extern volatile char * volatile dma_mem; extern volatile unsigned dma_count, dma_command, dma_busy; void put_data(char *data, unsigned size) { dma_mem = data; dma_count = size; // Following will trigger hardware to automatically copy "dma_count" // bytes from memory starting at "dma_mem"; dma_busy will read as // zero once operation is complete. dma_command = OUTPUT_MEMORY; // Exact value depends on ardware while(dma_busy) ; } 

In a compiler that refrains from storing anything in registers through access to volatile , such a function, as mentioned above, can be used to output data from "normal" memory, provided that all external calls are completed before the function returns. However, if the compiler stores data in registers even at volatile addresses in the optimization name, such code may fail if the buffer in which the data is placed also has volatile qualification.

PS - while volatile can be used and is often used to access I / O, it often does not have to (*) be almost the same as for things that are affected by interrupts. In many cases, I / O addresses were determined using constructs such as

 #define PORTA (*(unsigned char*)0xD000) #define PORTB (*(unsigned char*)0xD002) 

and while the Standard does not need a compiler to handle addresses such as volatile, many compilers will do it anyway, because using these addresses of programmers means that they know things that are not in compilers. On the contrary, the flags that are set by interrupt handlers look at the compiler as normal RAM, and this is just the volatile flag, which indicates that there is something special.

(*) I have seen many provider header files that do not use volatile for I / O addresses. If the compiler will generate the same code with or without this keyword, adding more words for the compiler to chew on each assembly will slow the compilation for no purpose. The authors of the Standard deliberately refrained from requiring all compilers to be suitable for embedded or system programming, and therefore made no effort to prohibit behavior that would render compilers unsuitable for such purposes. Code for a specific purpose should only work on compilers that are suitable for this purpose; if such code fails on the compiler, which intentionally becomes less suitable for this purpose, this does not mean that the code is “broken” - instead, it means that the compiler is no longer suitable for use with such code.

PS. For the compiler, to make any useful optimization based on a permanent address that is not volatile , it would have to either “know” that no other object was seen as having the same address, or resolve the probability that even if two integers x and y are equal, *(uint8_t*)x and *(uint8_t)y , writing to one may not be recognized as affecting the other. Since the Standard says that turning off the pointer to the integer and vice versa gives something that “compares equal” with the original pointer, but does not say that it can really be used for any purpose, this will be appropriate, but unexpected.

Consider, for example, the following program containing two separate translation blocks [assuming that the required headers are included]

 // UNIT ONE extern unsigned char foo; extern uintptr_t volatile tfoo; int foo_addr(void) { tfoo = (uintptr_t)&foo; return tfoo == 0x12345678; } // UNIT TWO void foo_addr(void); unsigned char foo; uintptr_t volatile tfoo; int main(void) { int ok = foo_addr(); foo = 2; if (ok) (unsigned char*)0x12345678 = 4; return ok + foo; } 

If foo not assigned the address 0x12345678, the record will not be sent to the address 0x12345678, and the code will return 0. If the address foo is 0x12345678, then (unsigned char*)0x12345678 will be a legal pointer to foo , and the compiler should be required to recognize access if it is not decides that he does not want to handle conversions with a pointer to an integer round-trip, which gives useful pointers.

The easiest way today is to consider (unsigned char*)0x12345678 as aliasing everything it would need to alias would be to treat it as volatile and refrain from caching in registers anything whose address has been exposed. Useful optimizations from treating such a variable as not being and refrain from caching in registers anything whose address has been exposed. Useful optimizations from treating such a variable as not being volatile`, will be rare if the compiler does not want to shift the semantics of the pointer.

+1
source

Source: https://habr.com/ru/post/1264630/


All Articles