No, there is no need to use the instructions MFENCE, SFENCE and LFENCE
against prefix LOCK
.
MFENCE, SFENCE and LFENCE
instruction guarantees memory visibility in all CPU cores. For example, the MOV
command cannot be used with the LOCK
prefix, therefore, to make sure that the result of the memory move is visible to all CPU cores, we must be sure that the CPU cache will turn red in RAM and that we will reach the instruction fence.
EDIT: more about blocked atomic operations from the Intel manual:
LOCKED ATOMIC OPERATIONS
32-bit IA-32 processors support blocked atomic operations on locations in the Memory system. Typically, these operations are used to manage common data structures (for example, semaphores, descriptor segments, system segments, or table pages) in which two or more processors can simultaneously change the same field or flag. The processor uses three interdependent mechanisms to block atomic operations:
• Guaranteed atomic operations
• Bus lock using LOCK # signal and LOCK instruction prefix
• Cache coherence protocols, ensuring atomic operations can be performed in cached data structures (cache lock); this mechanism is present in the Pentium 4, Intel Xeon, and P6 family of processors.
These mechanisms are interdependent in the following ways. Some basic memory transactions (such as reading or writing a byte to system memory) are always guaranteed atomically. That is, as soon as it starts, the processor guarantees that the operation will be performed before another processor or bus agent is allowed access to memory. The processor also supports the lock bus to perform selected memory operations (for example, a read-modify-write operation in a common memory area) that usually need to be processed atomically, but not automatically processed in this way. Because frequently used memory locations are often cached into L1 or L2 processors, atoms of an operation can often be performed inside processors cached without claiming to lock the bus. Here, the processor cache coherence protocols make sure that other processors that cache the same memory locations are properly managed; operations are performed in the memory cache.