Regarding this question, I'm only interested in x86 and x86-64.
For MSVC 2005, the documentation for __faststorefence says: "Ensures that every previous repository is displayed globally before any subsequent repository ."
For MSVC 2008 and 2010, it changed to: "Ensures that every previous memory reference , including both memory and storage , is globally visible before any subsequent memory reference."
The way the latter is written implies, in my opinion, that it also blocks the reordering of CPU loads to older stores. This is different from the first definition, which implies that the internal problem is only blocking or reordering non-temporary repositories with old repositories (this is the only other x86 (-64) reordering).
However, the documentation seems to contradict itself: "On the x64 platform, this procedure generates an instruction that is a faster store fence than sfence, use this built-in function instead of _mm_sfence on the x64 platform."
This means that it still has functionality similar to sfence, and thus loads can be reordered with old stores. So what is it? Can someone clear my confusion?
PS: I am looking for a GCC version of this function, I came across a long local; __asm__ __volatile__("lock; orl $0, %0;" : : "m"(local)); long local; __asm__ __volatile__("lock; orl $0, %0;" : : "m"(local)); but I think this is from 32-bit code; What will be the 64-bit analogue?
source share