Why (* p = * p) & (* q = * q); in C mode trigger undefined

Question

Why (* p = * p) & (* q = * q); in C mode trigger undefined

Why (*p=*p) & (*q=*q); in action C of the trigger is undefined if p and q are equal.

 int f2(int * p, int * q) { (*p=*p) & (*q=*q); *p = 1; *q = 2; return *p + *q; }

Source (Good article by the way): http://blog.frama-c.com/index.php?post/2012/07/25/On-the-redundancy-of-C99-s-restrict

+6

c undefined-behavior sequence-points pointers

nawfel bgh Jun 28 '15 at 13:29

source share

5 answers

If *p and *q denote the same memory cell, then writing both of them without an intermediate point in the sequence (or sequence relations in C11) causes undefined behavior.

= and & do not enter sequence points.

The code is equivalent to int i = 0; (i=i) & (i=i); int i = 0; (i=i) & (i=i); which has UB for the same reason. Another similar example would be (*p = 1) & (*q = 2) .

+3

MM Jun 28 '15 at 14:53

source share

In simple terms (*p = *p) & (*q = *q) is undefined if p and q have the same meaning, because:

You cannot mutate the same place twice in an unexplored assessment; and
You cannot read from a location that is mutating in the same non-operational estimate.

This behavior is undefined in both C and C ++, although the standard wording is slightly different (and the text above does not comply with any standard, it was intended as a simplified explanation). I'm sure you can find the exact texts on SO.)

The & operator is a simple bitwise and operator, so it does not impose any order of evaluation. It may seem that *p = *p is an obvious no-op, but there is no guarantee that it will be implemented that way. The compiler can (for example) implement this as tmp = *p; *p = 0; *p += tmp tmp = *p; *p = 0; *p += tmp tmp = *p; *p = 0; *p += tmp . It also cannot set all *p bits at once, requiring that the assignment be done in parts.

Now a little personal bug. The expression <something> "triggers undefined behavior" makes it sound as if there is a certain category of behavior called "w93> behavior", perhaps this is a kind of big red button that will start shooting nose demons in all directions when pressed. This is not a good model for what is happening. Better to say that " <something> behavior is undefined".

Remember that the behavior of the entire program is undefined if any part of the program that is running has undefined behavior. The whole program, not part of the program, starting with the part with undefined.

Finally - and this is the point of the related article - the compiler is allowed to assume that the behavior of the program is defined. Therefore, if the program includes an expression like (*p = *p) & (*q = *q) , then the compiler can assume that p and q point to different objects that do not overlap with each other. And once he makes this assumption, he can improve the optimization of expressions involving both * p and q. It is also likely that once the compiler has made this assumption, it can exclude all calculations (*p = *p) & (*q = *q) , since the intermediate values * p and * q (if any) are not observed if p and q are different. Thus, you can present this expression as a kind of declaration: you promise the compiler that you did everything necessary to ensure that p and q point to different objects that do not overlap with each other. (The compiler will not and probably will not be able to confirm your requirement. It will just take your word for it.)

The author then claims that this idiom is more powerful than the (somewhat controversial) keyword restrict . I have no doubt that this is so, and you can probably build expressions to cover a number of restrictions that cannot be easily expressed with restrict . So this seems like an interesting idea. On the other hand, the exact expression is at least obscure and easily mistaken.

+2

rici Jun 28 '15 at 23:17

source share

When the C standard was written, if the effect of a particular action will vary on different platforms, it will not always be possible for a particular platform to guarantee any particular exact effect, and if there could be plausible implementations in which the action could cause a hardware trap whose behavior It didn’t depend on the C compiler, there was little perceived value in that the Standard says nothing about behavior. Even if there wasn’t any significant likelihood of a hardware trap, the likelihood of “surprising” behavior was sufficient for brand behavior like Undefined.

Consider, for example, unsigned long x,*p; ... *p=(x++); unsigned long x,*p; ... *p=(x++); . If p==&x , it is not only possible that *p can contain not only the old value of x , but also the value 1 is greater, but if x was, for example, 0x0000FFFF, it could also plausibly end up containing 0x00000000, or 0x0001FFFF. Even if no machine starts a hardware trap, I don’t think that the authors of the Standard would consider "Any value changed more than once will contain an undefined value and any reading of the value of l in the same expression that writes it in a different way than allowed here, may give an indefinite meaning, "to be more useful than simply declaring actions such as Undefined Behavior. In addition, from the point of view of the authors of the Standard, the rejection of the Standard by the mandate for specific behavior in cases where some platforms can provide for free, while others cannot but create obstacles to the specification of such behavior on platforms that could provide them.

In practice, even very poorly described behavior can often be very useful for programs that have the following two requirements with the vast majority of programs written today:

If entered correctly, enter the correct result.
When using invalid data, do not launch nuclear missiles.

Unfortunately, someone came up with the idea that if the C-standard does not impose the action of any action X in a specific situation Y, even if most compilers have behavior that would be adequate for programs aimed at satisfying the above requirement (for example , most compilers will generate code for the expression p < q , which will either give 0 or 1, or will not have other side effects, even if p and q identify unrelated objects), then the action X should be considered as an indication of k the compiler, that the program will never receive any input that could cause situation Y.

The indicated (*p=*p) & (*q=*q) intended to represent such a "promise." The logic is that since the standard would not say anything about what the compiler can do if p==q , the compiler should assume that the programmer does not mind if the program launches nuclear missiles in response to any input that may cause code to be executed when p==q .

This idea and its consequences fundamentally contradict the nature and design tasks of C, and also use the system programming language. Almost all systems offer some features and warranties beyond those provided by the Standard, although features vary from one system to another. I find it absurd that the language is better served by overriding x < y from "I am ready to accept any pointer comparison methods used by any equipment on which this program will actually run", "I am so convincing that these two pointers will be connected so that I would put my life on it "than it would be, adding a new tool to tell the compiler to assume that" x and y are related pointers ", but somehow it seems to be accepted.

+2

supercat Jun 29 '15 at 17:56

source share

The question about this thread begins with "Why (*p=*p) & (*q=*q); in the C trigger, undefined behavior if p and q are equal?" and the questionnaire refers to an article explaining that the new keyword restrict in C (and C ++?) is not needed, because we can tell the compiler this by writing the expression (*p=*p) & (*q=*q); .

The explanation of this expression by Iwillnotexist Idonotexist is very thorough ... and very complex. In principle, the conclusion is that this is more a directive than a statement, because the expression does not give a result that is used and has only side effects (assignment to itself) that have no effects (itself remains unchanged, even if p==q ) , so any good compiler can optimize it.

Still not fully understanding the explanation, I select this new keyword and do not spell the wrong expression.

+1

Paul ogilvie Jun 29 '15 at 13:09

source share

Iwillnotexist idonotexist · Accepted Answer · 2015-06-28T15:25:16+0000

Regulation C11 Approval

 (*p=*p) & (*q=*q);

is an:

P1

§6.5p3
The grouping of operators and operands is denoted by syntax. 85) Except as noted below, side effects and calculation of subexpression values are irrelevant.

Since §6.5.10 The bitwise AND operator does not indicate the sequence of its operands, it follows that (*p=*p) and (*q=*q) not subject to the sequence.

P2

§6.5p2
If the side effect of a scalar object is independent of another side effect for the same scalar object or calculating a value using the value of the same scalar object, the behavior is undefined. If there are several valid orders of expression subexpressions, the behavior is undefined if such a side effect does not have a side effect in any of the orders. 84)

Both assignments (*p=*p) and (*q=*q) not influenced by wrt each other in § 6.5p3, and have a side effect for the same object if p==q . Therefore, if p==q , then by virtue of § 6.5p2 we have UB.

P3

§3.4.3
undefined behavior
when using an intolerable or erroneous software design or erroneous data for which this International Standard does not impose any requirements.

In this section, we know that the standard does not impose any requirements on UB. This is usually interpreted by compilers as a license to ignore the possibility of such behavior.

In particular, it allows the compiler not to handle the case p == q , which means that it can assume that p != q

P1 + P2 + P3 → C1

Since (*p=*p) and (*q=*q) can be taken by the combined rooms P1, P2 and P3, so as not to cause UB, they can also be considered as downloads and storages in different memory cells. It also means that the return value of f2 should be 3 , not 4 . If p == q , the Standard does not impose any requirements on what happens.

Why (* p = * p) & (* q = * q); in C mode trigger undefined

P1

P2

P3

P1 + P2 + P3 → C1

More articles: