he used to make sure that the value would be read / written as a whole
This is just a small part of atomicity. At its core, this means “uninterrupted”, an instruction on the processor whose side effects cannot alternate with another instruction. By design, a memory update is atomic when it can be performed using a single memory bus cycle. This requires that the address of the memory cell is aligned so that one cycle can update it. Uneven access requires additional work, part of the bytes written in one loop, and part of another. Now it is not uninterrupted.
Obtaining consistent updates is fairly simple; it is a guarantee provided by the compiler. Or, in a broader sense, the memory model implemented by the compiler. Which simply selects memory addresses that are aligned, sometimes deliberately leaving unused spaces of a few bytes to align the next variable. Updating a variable that is larger than the processor's native word size can never be atomic.
But more importantly, what processor instructions do you need to work with threads. Each processor implements a variant of the CAS instruction , compares and replaces. This is the kernel core instruction needed for synchronization. At the top of this basic command, all higher-level synchronization primitives are built, such as monitors (the same state variables), mutexes, signals, critical sections, and semaphores.
At a minimum, a processor usually provides additional ones to make simple atomic operations. Like incrementing a variable, it is based on an intermittent operation, since it requires a read-modify-write operation. The atomic need for it is very common, most other C ++ programs rely on it, for example, to perform reference counting.
volatility does not guarantee flow safety
This is not true. This is an attribute that dates from much easier times when machines had only one processor core. This only affects code generation, in particular how the code optimizer tries to eliminate memory access and use a copy of the value in the processor register instead. It makes a big, big difference in the speed of code execution, reading a value from a register is easy 3 times faster than reading it from memory.
The use of volatile ensures that the code optimizer will not consider the value in the register accurate and forces it to read the memory again. It really matters only for memory types that are not stable in themselves, devices that expose their registers through memory-mapped I / O modules. He abused heavily, as it is key to trying to put semantics on top of processors with a weak memory model, with Itanium being the most egregious example. What you get with volatile today is highly dependent on the particular compiler and the runtime you are using. Never use it to ensure thread safety; always use a synchronization primitive.
simply atomized / volatile is thread safe
Programming would be much easier if that were true. Atomic operations cover only the simplest operations; a real program often needs to maintain the integrity of the entire object. The presence of all its members is updated atomically and never exposes an idea of an object that is partially updated. Something as simple as iterating a list is a basic example; you cannot have another thread modifying the list while you look at its elements. This is when you need to achieve higher-level synchronization primitives, a type that can block code until it is safe.
Real programs often suffer from this need for synchronization and demonstrate Amdahls behavior. In other words, adding an extra thread does not actually speed up the program. Sometimes this makes it slower. Whoever finds the best mouse trap is guaranteed the Nobel, we are still waiting.