Can JIT collapse two mutable reads as one in certain expressions?

Suppose a volatile int a . One thread executes

 while (true) { a = 1; a = 0; } 

and another thread

 while (true) { System.out.println(a+a); } 

Now, was it illegal for the JIT compiler to emit an assembly matching 2*a instead of a+a ?

On the one hand, the very purpose of volatile reading is that it should always be fresh from memory.

On the other hand, there is no synchronization point between the two reads, so I don’t see that it would be illegal to process a+a atomically, and in this case I don’t see how optimizations like 2*a will break the specification.

Links to JLS are welcome.

+28
java volatile memory-model jit java-memory-model
Dec 19 '14 at 13:15
source share
5 answers

Short answer:

Yes, this optimization is allowed. Coagulation of two consecutive read operations leads to the observed behavior of the sequence, which is atomic, but does not look like a reordering of operations. Any sequence of actions performed in a single thread of execution can be performed as an elementary element. In general, it is difficult to provide an atomic sequence of operations, and this rarely leads to an increase in productivity, since in most runtime environments the overhead of atomic execution is imposed.

In the example given in the original question, the sequence of operations considered is as follows:

 read(a) read(a) 

Performing these operations atomically ensures that the value read in the first line is equal to the value read in the second line. In addition, this means that the value read in the second line is the value contained in a during the first read (and vice versa, since both atomic read operations occurred simultaneously in accordance with the observed program execution state), Considered optimization, which reuses the value of the first read for the second read is equivalent to the compiler and / or JIT executing the sequence atomically, and thus is valid.




Original longer answer:

The Java memory model describes operations using the "occurs before" partial ordering. To express the limitation that the first read of r1 and the second read of r2 a cannot be collapsed, you need to show that some operation is semantically required between them.

The operations on the stream with r1 and r2 as follows:

 --> r(a) --> r(a) --> add --> 

To express the requirement that something (say y ) lie between r1 and r2 , you need to require that r1 happen before y and y happen before r2 . As it happens, there is no rule where a read operation appears on the left side of the "occurs before" relationship. The closest thing you could get is to say that y happens before r2 , but a partial order would let y happen before r1 , thereby folding read operations.

If there is no scenario that requires the operation to be between r1 and r2 , then you can declare that an operation never appears between r1 and r2 , and not violate the required semantics of the language. Using a single read operation will be equivalent to this statement.

Edit My answer is rejected, so I'm going to go into further details.

Here are some related questions:

  • Is a Java or JVM compiler required to collapse these read operations?

    No. The expressions a and a used in the add expression are not constant expressions, so they are not required to be collapsed.

  • Does the JVM collapse these read operations?

    To this I am not sure of the answer. javap -c program and using javap -c , it is easy to see that the Java compiler does not collapse these read operations. Unfortunately, proving that the JVM does not disrupt the operation (or even the processor itself) is not so simple.

  • Should the JVM collapse these read operations?

    Probably no. Each optimization takes time to complete, so there is a balance between the time taken to analyze the code and the expected benefit. Some optimizations, such as eliminating array bounds checking or checking for null references, have proven to have extensive advantages for real-world applications. The only case where this particular optimization has the ability to improve performance is when two identical read operations appear sequentially.

    In addition, as shown in the answer to this answer, along with other answers, this particular change may lead to an unexpected change in behavior for some applications that users may not want.

Edit 2: Regarding Raphael, the description of the claim is that there are two read operations that cannot be reordered. This expression is intended to emphasize the fact that caching a read operation in the following sequence may produce an incorrect result: a

 a1 = read(a) b1 = read(b) a2 = read(a) result = op(a1, b1, a2) 

Suppose initially a and b have a default value of 0. Then you only execute the first read(a) .

Now suppose another thread executes the following sequence:

 a = 1 b = 1 

Finally, suppose the first thread executes the read(b) . If you were to cache the initially read value of a , you would receive the following call:

 op(0, 1, 0) 

It is not right. Since the updated value of a was saved before writing to b , it is impossible to read the value of b1 = 1 and then read the value of a2 = 0 . Without caching, the correct sequence of events leads to the next call.

 op(0, 1, 1) 

However, if you ask the question “Is there a way to enable read caching of a ?”, The answer is yes. If you can perform all three read operations in the first sequence of threads as an elementary element, then value caching is allowed. Although synchronization between several variables is difficult and rarely gives an advantage in opportunistic optimization, it is possible that an exception will arise. For example, suppose and a b are every 4 bytes, and they appear in memory sequentially with aligned 8 bytes. a 64-bit process can implement the read(a) read(b) sequence as an atomic 64-bit load operation that will cache the value of a (effectively treating all three read operations as an atomic operation, and not just the first two).

+13
Dec 20 '14 at 0:02
source share
— -

In my initial answer, I opposed the legitimacy of the proposed optimization. I supported this mainly from the information in the JSR-133 cookbook , which states that mutable reading should not be reordered with other mutable reading and where it is further that cached reading should be considered as reordering. The last statement, however, is formulated with some ambiguity, so I went through the formal definition of JMM where I did not find such an indication. Therefore, I would now argue that optimization is allowed. However, JMM is rather complicated, and the discussion on this page indicates that this boundary case can be solved differently by someone with a deeper understanding of formalism.

Thread designation 1 to execute

 while (true) { System.out.println(a // r_1 + a); // r_2 } 

and thread 2 to execute:

 while (true) { a = 0; // w_1 a = 1; // w_2 } 

Two reads of r_i and two records of w_i of a are synchronization actions since a is volatile (JSR 17.4.2). These are external actions because the variable a used in multiple threads. These actions are contained in a set of all actions a . There is a general order of all synchronization actions, a synchronization order that is consistent with the program execution order for stream 1 and stream 2 (JSR 17.4.4). From the definition of synchronism with partial order in the above code, there is no edge defined for this order. As a result, the just-put order only reflects the in-line semantics of each thread (JSR 17.4.5).

In this case, we define W as a write-write function, where W(r_i) = w_2 and the function written by value V(w_i) = w_2 (JLS 17.4.6). I took some freedom and eliminated w_1 , as this simplifies this outline of the formal proof. The question of the proposed implementation of E is correct (JLS 17.5.7). The proposed implementation of E obeys the semantics within the stream, occurs before coordination, obeys a synchronized order, and each read observes a consistent record. Checking causality requirements is trivial (JSR 17.4.8). I also do not understand why the rules for non-terminating executions will be relevant, since the loop covers the entire code under discussion (JLS 17.4.9), and we do not need to distinguish between the observed actions.

However, I cannot find any indication as to why this optimization will be banned. However, it does not apply to volatile read by HotSpot VM, as can be seen with -XX:+PrintAssembly . I assume that the performance benefits, however, are negligible, and this pattern is usually not observed.

Note. After looking at the Java memory prime model (several times), I am sure this reasoning is correct.

+11
Dec 21 '14 at 2:42 on
source share

Slightly changed the OP problem

  volatile int a //thread 1 while (true) { a = some_oddNumber; a = some_evenNumber; } // Thread 2 while (true) { if(isOdd(a+a)) { break; } } 

If the above code was executed sequentially, then there is a valid sequential sequential execution that breaks the thread2 while loop .

If the compiler optimizes + a to 2a, then the thread2 while loop will never be .

Thus, the above optimization will prohibit one particular execution if it was a code with sequential execution.

The main question is, is this optimization a problem?

 Q. Is the Transformed code Sequentially Consistent. 

Ans. . A program is correctly synchronized if, when it is executed sequentially in a sequential manner, there are no data calculations. See Example 17.4.8-1 of JLS Chapter 17

  Sequential consistency: the result of any execution is the same as if the read and write operations by all processes were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program [Lamport, 1979]. Also see http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.3 

Consistent consistency is a strong guarantee. The execution path where the compiler optimizes + a as 2a is also a valid sequential execution . So the answer is yes.

  Q. Is the code violates happens before guarantees. 

Ans. Consistent sequence implies that this occurs before the warranty is valid here. So the answer is yes. Jls ref

Therefore, I do not think that optimization is not valid, at least in the case of OP. The case where thread 2, while loops flock to infinte, is also quite possible without compiler conversion.

+2
Dec 24 '14 at 23:49
source share

On the one hand, the very purpose of volatile reading is that it should always be fresh from memory.

This is not how the Java language specification defines volatility. JLS just says:

Writing to variable v (§8.3.1.4) is synchronized with all subsequent readings of v any thread (where the “subsequent” is determined in accordance with the synchronization order).

Therefore, writing to a mutable variable occurs before (and is visible) for any subsequent readings of the same variable.

This restriction is trivially fulfilled for reading, which is not subsequent. That is, volatility provides only the visibility of the recording, if the reading, as you know, occurs after recording.

This does not apply to your program. For each well-formed execution that observes a as 1, I can build another well-formed execution, where a is considered equal to 0, just move the read after writing. This is possible because the “happens before” relationship is as follows:

 write 1 --> read 1 write 1 --> read 1 | | | | | vv | v --> read 1 write 0 v write 0 | vs. | --> read 0 | | | | vvvv write 1 --> read 1 write 1 --> read 1 

Thus, all the JMM guarantees for your program are that a + a will give 0, 1 or 2. This is true if a + a always gives 0. Just as the operating system is allowed to run this program on one core And always interrupt thread 1 to the same loop instruction, the JVM is allowed to reuse the value - in the end, the observed behavior remains unchanged.

In general, moving a read through a write violation occurs before consistency, because some other synchronization action is "on that path." In the absence of such intermediate synchronization actions, a readout may be performed from the cache.

+2
Dec 25 '14 at 22:32
source share

As stated in other answers, there are two reads and two entries. Imagine the following implementation (T1 and T2 denote two streams) using annotations that match the JLS statement below:

  • T1: a = 0 //W(r)
  • T2: read temp1 = a //r_initial
  • T1: a = 1 //w
  • T2: read temp2 = a //r
  • T2: print temp1+temp2

In a concurrrent environment, this is certainly a possible alternation of threads. Then your question is: can the JVM make r observe W(r) and read 0 instead of 1?

JLS # 17.4.5 states:

A set of actions is performed A to the agreed one, if for all it reads r in A, where W (r) is the write action observed by r, then there is no need for either hb (r, W (r)), or that exists we write w in such that wv = rv and hb (W (r), w) and hb (w, r).

The optimization you suggest ( temp = a; print (2 * temp); ) will violate this requirement. Thus, your optimization can only work if there is no intermediate record between r_initial and r that cannot be guaranteed in a typical multi-threaded infrastructure.

As a side comment, note, however, that there is no guarantee on how long it takes for the records to become visible from the read stream. See For example: Detailed semantics of variability regarding timeliness of visibility .

-2
Dec 23 '14 at 14:44
source share



All Articles