How can I understand which memory barriers are unstable

Some languages ​​provide a volatile , which is described as performing a "read memory barrier" before reading memory that supports the variable.

A reserved memory barrier is typically described as a way to ensure that the CPU reads before the barrier before it reads after the barrier. However, using this definition, it would seem that the obsolete value can still be read. In other words, doing a read in a specific order does not seem to mean that you need to consult the main memory or other processors to make sure that subsequent values ​​are considered truly reflected last in the system while reading the barrier or are written subsequently after reading the barrier.

So, does volatile really guarantee that the updated value will be read or simply (gasp!), That the read values ​​are at least as relevant as before the barrier? Or some other interpretation? What are the practical implications of this answer?

+48
multithreading volatile memory-barriers
Nov 24 '09 at 2:39
source share
2 answers

There are reading and writing barriers; acquire barriers and barriers to release. And more (io vs memory, etc.).

Barriers do not exist to control the "last" value or the "freshness" of values. They are designed to control the relative ordering of memory access.

Writing barriers controls the order of writing. Since memory writes are slow (compared to processor speed), there is usually a write request queue in which records are sent before they actually happen. Although they are queued in order, the entries inside the queue can be reordered. (So ​​maybe the “queue” is not the best name ...) If you do not use write barriers to prevent reordering.

Reading barriers controls the reading order. Due to speculative execution (the processor looks ahead and loads from memory earlier) and due to the existence of a write buffer (the CPU will read the value from the write buffer instead of memory, if there is one, that is, the processor thinks that it just wrote X = 5, then why read it back, just make sure it is still waiting for it to become 5 in the write buffer). Reading may happen out of order.

This is true, regardless of what the compiler is trying to do regarding the order of the generated code. those. "volatile" in C ++ will not help here, since it only tells the compiler to output code for re-reading the value from the "memory", it does NOT tell the CPU about how / where to read it (ie, "memory") this is a lot things at the CPU level).

Thus, read / write barriers place blocks to prevent reordering in read / write queues (reading is usually not part of the queue, but the reordering effects are the same).

Which blocks? - acquire and / or issue blocks.

Acquire - for example, read-acquire (x) will add reading x to the read queue and clear the queue (do not completely clear the queue, but add a marker without reordering anything before this reading, as if the queue was reset). Therefore, later (in code order) reads can be reordered, but not before reading x.

Release - for example, write-release (x, 5) will first start (or mark) the queue, then add the write request to the write queue. Thus, earlier entries will not be reordered to occur after x = 5, but note that subsequent entries can be reordered to x = 5.

Please note that I paired reading with receiving and writing with release, because this is typical, but different combinations are possible.

Acquisition and release are considered “half-barriers” or “half-closures,” because they only stop reordering from one path.

The full barrier (or complete fence) applies to both acquisition and release, i.e. no reordering.

Typically, for programming without blocking, either C # or java is 'volatile', what you want / read-write and write-release.

t

 void threadA() { foo->x = 10; foo->y = 11; foo->z = 12; write_release(foo->ready, true); bar = 13; } void threadB() { w = some_global; ready = read_acquire(foo->ready); if (ready) { q = w * foo->x * foo->y * foo->z; } else calculate_pi(); } 

So, first of all, this is a bad way to program threads. Locks would be safer. But just to illustrate the barriers ...

After threadA () writes foo, it needs to write foo-> ready LAST, really the last one, otherwise other threads might see foo-> ready early and get the wrong x / y / z values. Thus, we use write_release in foo-> ready, which, as mentioned above, effectively "drops" the write queue (providing x, y, z commit), then adds the ready = true request to the queue. And then it adds the query bar = 13. Note that since we just used the release barrier (not complete), bar = 13 can be written until ready. But we don’t care! those. we assume that the bar does not change the general data.

Now threadB () should know that when we say “done,” we really mean readiness. So we do read_acquire(foo->ready) . This read is added to the read queue, after which the queue is blurred. Note that w = some_global may also remain in the queue. Therefore, foo-> ready can be read before some_global . But then again, we don’t care, because this is not part of the important data that we are so careful about. We are interested in foo-> x / y / z. Thus, they are added to the read queue after the flash / receive token, ensuring that they are read only after reading foo-> ready.

Note also that, as a rule, these are the same barriers used to lock and unlock the mutex / CriticalSection / etc. (i.e., acquire when locked (), release when unlocked ()).

So,

  • I am sure that this (i.e. receiving / issuing) is exactly what MS docs say for reading / writing "mutable" variables in C # (and possibly for MS C ++, but this is non-standard), See http://msdn.microsoft.com/en-us/library/aa645755(VS.71).aspx including "Volatile reading has" acquires semantics ", that is, it is guaranteed to happen before any memory references that occur after him ... "

  • I think java is the same, although I'm not so used to it. I suspect this is exactly the same, because you just don't need more guarantees than read-write / write-release.

  • In your question, you were on the right track, thinking that it was really all about the relative order - you just had orders back (ie, “read values ​​are at least as relevant as they are read in front of the barrier?” - no is read before the barrier is unimportant, it is read AFTER the barrier guaranteed AFTER, on the contrary, for writing).

  • And note that, as already mentioned, reordering occurs both when reading and in writing, so use only a barrier on one thread, not another, DO NOT WORK. those. exemption from writing is not enough without reading. those. even if you write it in the correct order, it can be read in the wrong order if you do not use read barriers to go with write barriers.

  • And finally, note that locking the programming and architecture of the CPU's memory can actually be a lot trickier than that, but sticking with receive / release will take you pretty far.

+93
Nov 24 '09 at 6:23
source share
— -

volatile in most programming languages ​​does not imply a real memory limit for reading in the CPU, and ordering the compiler not to optimize reading through caching in the register. This means that the read process / stream will receive the value "ultimately". A common technique is to declare a Boolean volatile flag, which must be set in the signal handler and checked in the main program loop. On the contrary, CPU memory locks are directly provided either by means of CPU instructions or are implied by some assembler cells (for example, the lock prefix in x86) and are used, for example, when talking to hardware devices, where the order of reading and writing to I / O registers with memory mapping, it is important or synchronizes memory access in a multiprocessing environment. To answer your question - no, the memory barrier does not guarantee the "last" value, but it does guarantee the order of memory access operations. This is important, for example, in no lock .

Here is one of the primers on the memory barriers of the CPU.

+8
Nov 24 '09 at 2:59
source share



All Articles