Atomicity and the order of memory
For an operation to be atomic, it must seem like one indivisible operation to any observer. This observer may be what can see the effect of the operation, regardless of whether this thread is working, another thread on one processor — a thread on another processor or some component or device in the system. Observers who cannot see the effect of the operation, whether it is the same stream, another stream or device, do not affect whether the operation is atomic or not.
(Note that by processor, I mean that Intel documentation will call the logical processor. A system with two processor sockets, each of which has a quad-core processor with two logical processors per core, will have a total of 16 processors.)
A related but different concept is the ordering of memory. Access to memory is only sequentially coordinated if they appear to the observer, as this happens in the order in which they occur in the program. This guarantee is always applied when the observer is the same stream as the operations being performed. Other more limited guarantees for memory ordering are also possible. Strong, but not consistent order matching can ensure that many types of operations are ordered relative to each other, but not all. Weak memory ordering gives no guarantee of how other threads are accessed.
Compilers and atomicity
When you write a program in C or some other higher-level language, it may seem that some operations are atomic and ordered sequentially, but the compiler usually guarantees this only when viewing from the same thread that performed these operations. However, from the point of view of the compiler, any code that executes when a thread is interrupted asynchronously occurs in different execution threads, even if this code runs on the same OS thread. This means that code running in a signal handler or in a structured exception handler does not guarantee that operations performed outside the handler in the same thread will be atomic or sequential.
Due to the limited general guarantee, the compiler can do things like implement what looks like atomic operations using several assembler instructions, make them non-atomic for other observers. The compiler can also change the memory access order, even completely remove obviously redundant calls. He can do whatever optimizations he wants so much, in the only continuous threading case where the program still behaves as if it performed all these operations in program order.
In a multi-threaded case or where signal or exception handlers are present, special steps must be taken to tell the compiler where you need it to provide broader guarantees of atomicity and ordering of the memory. This is the purpose of special atomic types and functions. Even if the CPU ensures that each instruction is atomic, and each memory access is consistently consistent with all other threads, the compiler does not.
Intel and Atomity Processors
Intel processors make it easy for the compiler to provide these guarantees. Except in some odd cases, the instructions are uninterrupted. Any event that interrupts the execution of an instruction occurs after the completion of a complete instruction or allows the execution of an instruction to resume, as if it had never been executed. Means means that at the level of machine code, each operation is atomic, and each memory operation is consistently consistent, it seems that the code runs on a single processor. In the case of a single processor, nothing needs to be done to provide these guarantees, unless they should be visible to devices other than the processor. In this case, the LOCK prefix combined with unencrypted memory areas should be used to ensure that the read / modify / write instructions are atomic and the memory accesses look consistent with other devices.
In the multiprocessor case, when accessing cached memory, the cache coherence protocol provides atomicity guarantees with most instructions and strong memory ordering, but not sequential coordination. The exact mechanism by which this is done does not matter much, just guarantees are given. Any instruction that refers to only one memory location will be atomic for other processors. Order guarantees are too long to go here, Intel uses 16 tokens to describe them, but they appear to be a superset to ensure that C and C ++ provide a purchase and release order for memory. When this level of memory ordering is specified, C / C ++ atomic operations can use normal, unlocked instructions.
The need for the LOCK prefix and for those instructions where the LOCK prefix is implicit comes when you need more reliable guarantees than the cache negotiation protocol provides. If you need your read / modifiy / write instructions to be atomic, you need to use the LOCK prefix. If you need consistent matching, you need to use the LOCK prefix.
The LOCK prefix is the place where the high cost of atomic operations takes place. This causes the processor to wait for all previous load and storage operations to complete. Despite the fact that when accessing cached memory, the LOCK prefix is completely processed in the cache without LOCK # approval, the processor should still wait to ensure that the operation is consistently consistent with other processors.
Summary
Thus, the answers to your questions:
- The cache coherence protocol can only ensure the atomicity of a particular machine code command when viewed from other processors. It cannot guarantee that the compiler generates one command for the operation in which you want to be atomic. It also cannot guarantee that the instruction appears atomic for non-processor devices in the system.
- The
LOCK prefix is used for machine code instructions that- perform multiple memory accesses and appear to be atomic to other processors.
- must be consistent with other processors.
- must be atomic and / or sequentially compatible with other non-processor devices.
- When it is possible to get the necessary atomicities and guarantees of the memory order without using the
LOCK prefix, the instructions used will be the same as regular instructions, and therefore cost the same. Where the LOCK prefix is necessary to provide the necessary guarantees, the cost of the instruction becomes much higher than the usual instruction.