Does the script with thread # 2b create the correct release sequence?
Yes , for your quote from the standard.
What are the specific properties of a read-modify-write operation to ensure that this scenario is correct?
Well, the somewhat circular answer is that the only important specific property is that "the C ++ standard defines it that way."
As a practical question, one may ask why the standard defines it this way. I don’t think that you will find that the answer has a deep theoretical foundation: I think that the committee could also define it in such a way that RMW does not participate in the release sequence, or (perhaps with great difficulty) determine that RMW and the separate loading and storage of mo_relaxed involved in the release sequence without compromising the “stability” of the model.
They already give an idea of why they did not choose the latter approach:
Such a requirement sometimes interferes with effective implementation.
In particular, on any hardware platform that allows reordering storage-loading, this would mean that even mo_relaxed loads and / or stores may require barriers! Such platforms exist today. Even on more well-ordered platforms, it can hinder compiler optimization.
So why didn't they take, then take a different “sequential” approach that does not require RMW mo_relaxed to participate in the release sequence? This is likely because existing hardware implementations of RMW operations provide such guarantees, and the nature of RMW operations makes it likely that this will be true in the future. In particular, as Peter points out in the comments above, RMW operations, even with mo_relaxed , are conceptually and practically 1 stronger than individual loads and stores: they would be completely useless if they did not have an agreed general order.
Once you agree on how the equipment works, to align the standard it makes sense to determine the performance value: if you haven’t done this, you have people using more restrictive orderings, such as mo_acq_rel , to get the warranty release sequence, but on real equipment which has poorly ordered CAS, it does not come for free.
1 The “practical” part means that even the weakest forms of RMW instructions are usually relatively “expensive” operations that take dozens of cycles or more on modern equipment, while mo_relaxed loads and saves as a whole, just compile normal loads and storages in the target ISA.