There are reading and writing barriers; acquire barriers and barriers to release. And more (io vs memory, etc.).
Barriers do not exist to control the "last" value or the "freshness" of values. They are designed to control the relative ordering of memory access.
Writing barriers controls the order of writing. Since memory writes are slow (compared to processor speed), there is usually a write request queue in which records are sent before they actually happen. Although they are queued in order, the entries inside the queue can be reordered. (So maybe the “queue” is not the best name ...) If you do not use write barriers to prevent reordering.
Reading barriers controls the reading order. Due to speculative execution (the processor looks ahead and loads from memory earlier) and due to the existence of a write buffer (the CPU will read the value from the write buffer instead of memory, if there is one, that is, the processor thinks that it just wrote X = 5, then why read it back, just make sure it is still waiting for it to become 5 in the write buffer). Reading may happen out of order.
This is true, regardless of what the compiler is trying to do regarding the order of the generated code. those. "volatile" in C ++ will not help here, since it only tells the compiler to output code for re-reading the value from the "memory", it does NOT tell the CPU about how / where to read it (ie, "memory") this is a lot things at the CPU level).
Thus, read / write barriers place blocks to prevent reordering in read / write queues (reading is usually not part of the queue, but the reordering effects are the same).
Which blocks? - acquire and / or issue blocks.
Acquire - for example, read-acquire (x) will add reading x to the read queue and clear the queue (do not completely clear the queue, but add a marker without reordering anything before this reading, as if the queue was reset). Therefore, later (in code order) reads can be reordered, but not before reading x.
Release - for example, write-release (x, 5) will first start (or mark) the queue, then add the write request to the write queue. Thus, earlier entries will not be reordered to occur after x = 5, but note that subsequent entries can be reordered to x = 5.
Please note that I paired reading with receiving and writing with release, because this is typical, but different combinations are possible.
Acquisition and release are considered “half-barriers” or “half-closures,” because they only stop reordering from one path.
The full barrier (or complete fence) applies to both acquisition and release, i.e. no reordering.
Typically, for programming without blocking, either C # or java is 'volatile', what you want / read-write and write-release.
t
void threadA() { foo->x = 10; foo->y = 11; foo->z = 12; write_release(foo->ready, true); bar = 13; } void threadB() { w = some_global; ready = read_acquire(foo->ready); if (ready) { q = w * foo->x * foo->y * foo->z; } else calculate_pi(); }
So, first of all, this is a bad way to program threads. Locks would be safer. But just to illustrate the barriers ...
After threadA () writes foo, it needs to write foo-> ready LAST, really the last one, otherwise other threads might see foo-> ready early and get the wrong x / y / z values. Thus, we use write_release in foo-> ready, which, as mentioned above, effectively "drops" the write queue (providing x, y, z commit), then adds the ready = true request to the queue. And then it adds the query bar = 13. Note that since we just used the release barrier (not complete), bar = 13 can be written until ready. But we don’t care! those. we assume that the bar does not change the general data.
Now threadB () should know that when we say “done,” we really mean readiness. So we do read_acquire(foo->ready) . This read is added to the read queue, after which the queue is blurred. Note that w = some_global may also remain in the queue. Therefore, foo-> ready can be read before some_global . But then again, we don’t care, because this is not part of the important data that we are so careful about. We are interested in foo-> x / y / z. Thus, they are added to the read queue after the flash / receive token, ensuring that they are read only after reading foo-> ready.
Note also that, as a rule, these are the same barriers used to lock and unlock the mutex / CriticalSection / etc. (i.e., acquire when locked (), release when unlocked ()).
So,
I am sure that this (i.e. receiving / issuing) is exactly what MS docs say for reading / writing "mutable" variables in C # (and possibly for MS C ++, but this is non-standard), See http://msdn.microsoft.com/en-us/library/aa645755(VS.71).aspx including "Volatile reading has" acquires semantics ", that is, it is guaranteed to happen before any memory references that occur after him ... "
I think java is the same, although I'm not so used to it. I suspect this is exactly the same, because you just don't need more guarantees than read-write / write-release.
In your question, you were on the right track, thinking that it was really all about the relative order - you just had orders back (ie, “read values are at least as relevant as they are read in front of the barrier?” - no is read before the barrier is unimportant, it is read AFTER the barrier guaranteed AFTER, on the contrary, for writing).
And note that, as already mentioned, reordering occurs both when reading and in writing, so use only a barrier on one thread, not another, DO NOT WORK. those. exemption from writing is not enough without reading. those. even if you write it in the correct order, it can be read in the wrong order if you do not use read barriers to go with write barriers.
And finally, note that locking the programming and architecture of the CPU's memory can actually be a lot trickier than that, but sticking with receive / release will take you pretty far.