Consolidation of storage / loads of sequential atomic variables

Referring to (slightly outdated) paper by Hans Boy in the "Atomic Operations" section. It mentions that the memory model (proposed at the time) will not prevent the optimization compiler from combining a sequence of loads or storages from the same variable from combining into a single load. His example is as follows (updated, hopefully correct syntax):

Considering

atomic<int> v; 

The code

 while( v.load( memory_order_acquire ) ) { ... } 

It can be optimized for:

 int a = v.load(memory_order_acquire); while(a) { ... } 

Obviously, that would be bad, as he claims. Now my question is, since the document is a bit old, does the current C ++ 0x memory model prevent this type of optimization or is it still technically resolved?

My reading of the standard seems to be biased, but using the semantics of “acquire” makes it less clear. For example, if it is "seq_cst", the model seems to guarantee that loading should participate in the general order when accessing and loading the value only once, thus, it seems to disrupt the order (since it breaks the sequence, it occurs before the relation).

By purchasing, I interpret 29.3.2 to mean that this optimization cannot happen, since any “release” operation must be observed during the “acquire” operation. Running only one instance seems invalid.

So my question is, will the current model (in the pending standard) prohibit this type of optimization? And if so, which part specifically prohibits this? If not, is the volatile atom using the problem?

And for the bonus, if the download operation has a “relaxed” ordering, is optimization allowed?

+6
source share
2 answers

The C ++ 0x standard attempts to prohibit this optimization.

The corresponding words are taken from 29.3p13:

Implementations should make nuclear storage visible to atomic loads within a reasonable amount of time.

If the thread executing the load only ever issues one load command, this is violated, as if it skips the recording for the first time, it will never see it. It does not matter which memory order is used for the load, it is the same for memory_order_seq_cst and memory_order_relaxed .

However, the following optimization is possible. if there isn’t something in the loop that forces you to order:

 while( v.load( memory_order_acquire ) ) { for(unsigned __temp=0;__temp<100;++__temp) { // original loop body goes here } } 

i.e. the compiler can generate code that executes actual loads arbitrarily infrequently, provided that it still executes them. This is even allowed for memory_order_seq_cst if there are no other memory_order_seq_cst operations in the memory_order_seq_cst , since this is equivalent to starting 100 iterations between any memory access images by other threads.

As an aside, using memory_order_acquire does not have the effect you are describing - you do not need to see release operations (other than 29.3p13 above), just if it does , see the Release Operation section, then it sets visibility restrictions for others .

+2
source

From the document you are linking:

Volatil ensures that the right amount of memory operations is performed.

The standard says essentially the same thing:

Access to unstable objects is evaluated strictly in accordance with the rules of an abstract machine.

This has always been the case, as it seems to me to be the very first C compiler from Dennis Ritchie. This should be so, because memory-mapped I / O registers will not work otherwise. To read two characters from the keyboard, you need to read the corresponding memory register twice. If the compiler had a different idea of ​​the number of readings that it should perform, that would be too bad!

0
source

Source: https://habr.com/ru/post/892664/


All Articles