Theoretically, volatile not enough. There are two layers of abstraction:
- between source code actions and valid opcodes;
- between what the kernel / processor sees and what other kernels / processors see.
The compiler can cache data in a register and reorder read and write. Using volatile , you instruct the compiler to produce codes of operations that read and write exactly in the order specified in the source code. But this only processes the first layer. A hardware system that controls communications between processor cores can also delay and change the reading and writing order.
It so happened that on x86 hardware, the cores allocate entries to the main memory quite quickly, while other cores are automatically notified that the memory has changed. So volatile is enough: it guarantees that the compiler will not play funky games with registers, and the memory system is kind enough to process things from this point. Note, however, that this is not true for all systems (I think that at least some Sparc systems could slow down the distribution of recordings for arbitrary delays - maybe hours), and I read in one of AMD's manuals that AMD clearly reserves eligible spread faster are recorded in some future processors.
Thus, a clean solution is to use a mutex ( pthread_mutex_lock() on Unix, EnterCriticalSection() on Windows) when accessing your global variable (both for reading and writing). Mutex primitives include a special operation known as a memory barrier that resembles volatile like steroids (it acts like volatile for both layers of abstraction).
source share