I think you mix atomic memory access with cache coherency . The first one is the necessary hardware support for creating synchronization primitives in software (spin-locks, semaphores and mutexes), and the second is the hardware support for several chips (several processors and peripherals) operating on the same bus, and a consistent view main memory.
/ . , , GCC . compare-and-swap /store -conditional . , , -S GCC, , .
- - - , , , --.
aligned (- ). int , (. GCC builtins ).
. . , . - , , , . ( " " ) . , (, ), , ( ).
volatile . , . , " " .
. , .
Edit:
++ 0x, , concurrency, . Hans Boehm ++ .