atomic_thread_fence(memory_order_seq_cst) always generates a complete barrier.
- x86_64:
MFENCE - PowerPC:
hwsync - Itanuim:
mf - ARMv7 / ARMv8:
dmb ish - MIPS64:
sync
The main thing : the observing stream can simply observe in a different order and it does not matter which barriers you use in the observed topic.
Is the optimization compiler allowed to reorder instruction (3) to to (1)?
No, it is forbidden. But in a globally visible multithreaded program, this is true only if:
- other threads use the same
memory_order_seq_cst for atomic read / write operations with these values - or if other threads use the same
atomic_thread_fence(memory_order_seq_cst); between load () and store (), too, but this approach does not guarantee consistent consistency in general, since consistent consistency is a more reliable guarantee.
Working draft, standard for the C ++ programming language 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
§ 29.3 Order and consistency
§ 29.3 / 8
[Note: memory_order_seq_cst provides consistent consistency only for a program that is free of data races and uses only memory_order_seq_cst operations. Any use of weaker orders will void this warranty unless extreme care is taken. In particular, memory_order_seq_cst fences provide a general order only for the fences themselves. Fences cannot, as a rule, be used to restore consistent consistency for atomic operations with weaker ordering parameters. - final note]
How it can be compared with the assembler:
Case-1:
atomic<int> x, y y.store(1, memory_order_relaxed); //(1) atomic_thread_fence(memory_order_seq_cst); //(2) x.load(memory_order_relaxed); //(3)
This code is not always equivalent to the value of Case-2, but this code creates the same instructions between STORE and LOAD, and if both LOAD and STORE use memory_order_seq_cst , this is Sequential Consistency, which prevents StoreLoad, Case-2 reordering:
atomic<int> x, y; y.store(1, memory_order_seq_cst);
With some notes:
Manual for ARMv8-A
Table 13.1. Barrier parameters
ISH Any - Any
Any - Any This means that both loads and shops must be completed before the barrier. Both loads and stores that appear after the barrier in the order of the program must wait for the completion of the barrier.
Preventing the reordering of two instructions can be accomplished with additional instructions between the two. And since we see that the first STORE (seq_cst) and the next LOAD (seq_cst) generate commands between it, the same as FENCE (seq_cst) ( atomic_thread_fence(memory_order_seq_cst) )
Mapping C / C ++ 11 memory_order_seq_cst to distinguish CPU architectures for: load() , store() , atomic_thread_fence() :
Note atomic_thread_fence(memory_order_seq_cst); always generates a complete barrier:
x86_64: STORE- MOV (into memory), MFENCE , LOAD- MOV (from memory) , fence- MFENCE
x86_64-alt: STORE- MOV (into memory) , LOAD- MFENCE ,MOV (from memory) , fence- MFENCE
x86_64-alt3: STORE- (LOCK) XCHG , LOAD- MOV (from memory) , fence- MFENCE - complete barrier
x86_64-alt4: STORE- MOV (into memory) , LOAD- LOCK XADD(0) , fence- MFENCE - complete barrier
PowerPC: STORE- hwsync; st hwsync; st , LOAD- hwsync; ld; cmp; bc; isync ld; cmp; bc; isync , fence- hwsync
Itanium: STORE- st.rel; mf , LOAD- ld.acq , fence- mf
ARMv7: STORE- dmb ish; str; dmb ish; str; dmb ish , LOAD- ldr; dmb ish ldr; dmb ish fence - dmb ish
ARMv7-alt: STORE- dmb ish; str dmb ish; str , LOAD- dmb ish; ldr; dmb ish ldr; dmb ish , fence- dmb ish
ARMv8 (AArch32): STORE- STL , LOAD- LDA , fence- dmb ish - full barrier
ARMv8 (AArch64): STORE- STLR , LOAD- LDAR , fence- dmb ish - full barrier
MIPS64: STORE- sync; sw; sync; sw; sync; LOAD sync; lw; sync; sync; lw; sync; , fence- sync
All C / C ++ 11 semantics mapping for difference processor architectures is described for: load (), store (), atomic_thread_fence (): <a13>
Since Sequential-Consistency prevents StoreLoad reordering, and because Sequential-Consistency ( store(memory_order_seq_cst) and next load(memory_order_seq_cst) ) generates commands, between them are the same as atomic_thread_fence(memory_order_seq_cst) , then atomic_thread_fence(memory_order_seq_cst) memory_thread_fread_orders