Unscheduled execution and reordering: can I see what is after the barrier before the barrier?

According to wikipedia: a memory barrier, also known as membar, a memory fence or barrier instruction, is a type of barrier instruction that forces the central processing unit (CPU) or compiler to force the restriction of the order of memory operations issued before and after the barrier instruction. This usually means that transactions issued before the barrier are guaranteed before transactions issued after the barrier.

Typically, articles talk about something like (I will use monitors instead of membars):

class ReadWriteExample { int A = 0; int Another = 0; //thread1 runs this method void writer () { lock monitor1; //a new value will be stored A = 10; //stores 10 to memory location A unlock monitor1; //a new value is ready for reader to read Another = 20; //@see my question } //thread2 runs this method void reader () { lock monitor1; //a new value will be read assert A == 10; //loads from memory location A print Another //@see my question unlock monitor1;//a new value was just read } } 

But I am wondering if it is possible that the compiler or processor will mix things up so that the code prints 20? I do not need a guarantee.

those. by definition, operations issued before the barrier cannot be discarded by the compiler, but is it possible that operations issued after the barrier will sometimes occur before the barrier? (just probability)

thanks

+6
source share
3 answers

My answer below deals only with the Java memory model. The answer really cannot be made for all languages, since everyone can define rules in different ways.

But I am wondering if it is possible that the compiler or processor will mix things up so that the code prints 20? I do not need a guarantee.

Your answer seems to be β€œIs it possible to save the A = 20 vault, reassign it above the unlock monitor?”

The answer is yes , it can be. If you look at the JSR 166 Cookbook , the first grid shows how overrides work.

In your case, writer first operation will be MonitorExit , the second operation will be NormalStore . The grid explains, yes, this sequence is allowed for reorder.

This is called a Roach Motel , meaning memory access can be moved to a synchronized block, but cannot be deleted.


What about another language? Well, this question is too broad to answer all the questions, since everyone can determine the rules in different ways. If so, you need to clarify your question.

+2
source

In Java, the concept of "happen sooner" exists. You can read all the details about this in the Java Specification . A Java compiler or runtime engine may reorder the code, but it must abide by the rules described above. These rules are important for a Java developer who wants to have detailed control over how their code is reordered. I myself was burned out by reordering the code, it turned out that I was referring to the same object using two different variables, and the execution mechanism re-ordered my code, not realizing that the operations were on the same object. If I had either what happened earlier (between the two operations), or the same variable was used, then the second order would not have occurred.

In particular:

From the above definitions it follows that:

Unlocking on the monitor occurs before each subsequent lock on this monitor.

It is recorded in an unstable field (Β§8.3.1.4) - before each subsequent field, read this field.

The start () call on the stream is called β€” before any action on the stream begins.

All actions in the thread occur before any other thread successfully returns from join () in that thread.

The default initialization of any object occurs - before any other action (except the default entry) of the program.

+1
source

The short answer is yes. It is very dependent on the compiler and processor architecture. You have a definition of race conditions. Quantum planning does not finish the middle instruction (cannot have two entries in the same place). However - a quantum can end between instructions - and how they are executed due to the order in the pipeline depends on the architecture (outside the monitor block).

Now comes "it depends" on the complications. The CPU guarantees little (see Race Status). You can also look at NUMA (ccNUMA) - this is a way to scale access to the CPU and memory by grouping processors (nodes) with local RAM and the group owner, plus a special bus between the nodes.

The monitor does not prevent another thread from starting. This only prevents him from entering code between monitors. Therefore, when Writer leaves the monitor section, he can execute the following statement - regardless of whether another thread is inside the monitor. Monitors are gates that block access. In addition, a quantum can interrupt the second stream after the A == operator, which allows another to change the value. Again - the quantum will not interrupt the middle instruction. Always think that threads run in perfect parallel.

How do you apply this? I'm a bit out of date (sorry, C # / Java these days) with current Intel processors - and how their Pipelines work (hyperthreading, etc.). A few years ago I worked with a processor called MIPS - and had (through the order of compiler instructions) the ability to execute instructions that occurred sequentially AFTER branch instructions (Delay Slot). In this combination of CPU / Compiler - YES - what you have described can happen. If Intel offers the same thing - then yes - it can happen. Esp with NUMA (both Intel and AMD have, I am most familiar with the implementation of AMD).

My point is - if threads were running through NUMA nodes - and access was to a shared memory folder, then this could happen. Of course, it is very difficult for the OS to plan operations inside the same node.

Perhaps you can imitate this. I know that C ++ on MS allows access to NUMA technology (I played with it). See if you can allocate memory across two nodes (placing A on one and the other on the other). Schedule threads to work on specific nodes.

What happens in this model is that there are two ways in RAM. I guess this is not what you had in mind - perhaps only one model of the path / Node. In this case, I return to the MIPS model described above.

I suggested that the processor interrupts - there are others that have a Yield model.

+1
source

Source: https://habr.com/ru/post/984856/


All Articles