Using atomic read-modify-write operations in a release sequence

Question

Using atomic read-modify-write operations in a release sequence

Let's say I create an object of type Foo in thread # 1 and want to have access to it in thread # 3.
I can try something like:

 std::atomic<int> sync{10}; Foo *fp; // thread 1: modifies sync: 10 -> 11 fp = new Foo; sync.store(11, std::memory_order_release); // thread 2a: modifies sync: 11 -> 12 while (sync.load(std::memory_order_relaxed) != 11); sync.store(12, std::memory_order_relaxed); // thread 3 while (sync.load(std::memory_order_acquire) != 12); fp->do_something();

Store / issue in stream # 1 orders Foo with upgrade to 11
thread # 2a nonatomically increases sync to 12
synchronization with the relation between threads # 1 and # 3 is established only when # 3 loads 11

The script is interrupted because thread # 3 rotates until it loads 12, which may fail (11 Foo ) and Foo not ordered from 12 (due to relaxed operations on thread # 2a).
This is somewhat contrary to intuition, since the sync modification order is 10 → 11 → 12

The standard says (§ 1.10.1-6):

atomic storage-storage is synchronized with boot-boot, which takes its value from the storage (29.3). [Note. Except where indicated, reading a later value does not necessarily provide visibility, as described below. Such a requirement sometimes interferes with effective implementation. -end note]

In (§ 1.10.1-5) it is also said:

The release sequence, led by the release operation A on the atomic object M, is the maximum contiguous subsequence of side effects in the modification order M, where the first operation A and each subsequent operation are performed by the same thread as A, or
- atomic read-modify-write operation.

Now stream # 2a is modified to use the atomic read-modify-write operation:

 // thread 2b: modifies sync: 11 -> 12 int val; while ((val = 11) && !sync.compare_exchange_weak(val, 12, std::memory_order_relaxed));

If this release sequence is correct, Foo synchronizes with thread # 3 at boot 11 or 12. My questions about using atomic read-modify-write:

Is the script with thread # 2b the correct version?

And if so:

What are the specific properties of the read-modify-write operation that ensure this scenario is correct?

+6

c ++ multithreading atomic c ++ 11 memory-model

Lwimsey Aug 15 '17 at 14:00

source share

1 answer

BeeOnRope · Answer 1 · 2017-09-01T21:41:16+0000

Does the script with thread # 2b create the correct release sequence?

Yes , for your quote from the standard.

What are the specific properties of a read-modify-write operation to ensure that this scenario is correct?

Well, the somewhat circular answer is that the only important specific property is that "the C ++ standard defines it that way."

As a practical question, one may ask why the standard defines it this way. I don’t think that you will find that the answer has a deep theoretical foundation: I think that the committee could also define it in such a way that RMW does not participate in the release sequence, or (perhaps with great difficulty) determine that RMW and the separate loading and storage of mo_relaxed involved in the release sequence without compromising the “stability” of the model.

They already give an idea of why they did not choose the latter approach:

Such a requirement sometimes interferes with effective implementation.

In particular, on any hardware platform that allows reordering storage-loading, this would mean that even mo_relaxed loads and / or stores may require barriers! Such platforms exist today. Even on more well-ordered platforms, it can hinder compiler optimization.

So why didn't they take, then take a different “sequential” approach that does not require RMW mo_relaxed to participate in the release sequence? This is likely because existing hardware implementations of RMW operations provide such guarantees, and the nature of RMW operations makes it likely that this will be true in the future. In particular, as Peter points out in the comments above, RMW operations, even with mo_relaxed , are conceptually and practically ¹ stronger than individual loads and stores: they would be completely useless if they did not have an agreed general order.

Once you agree on how the equipment works, to align the standard it makes sense to determine the performance value: if you haven’t done this, you have people using more restrictive orderings, such as mo_acq_rel , to get the warranty release sequence, but on real equipment which has poorly ordered CAS, it does not come for free.

^{1 The} “practical” part means that even the weakest forms of RMW instructions are usually relatively “expensive” operations that take dozens of cycles or more on modern equipment, while mo_relaxed loads and saves as a whole, just compile normal loads and storages in the target ISA.

Using atomic read-modify-write operations in a release sequence

More articles: