Does pointer-to-T, array-of-T, and pointer-to-array-of-T behavior ever undefined?

Consider the following code.

#include <stdio.h> int main() { typedef int T; T a[] = { 1, 2, 3, 4, 5, 6 }; T(*pa1)[6] = (T(*)[6])a; T(*pa2)[3][2] = (T(*)[3][2])a; T(*pa3)[1][2][3] = (T(*)[1][2][3])a; T *p = a; T *p1 = *pa1; //T *p2 = *pa2; //error in c++ //T *p3 = *pa3; //error in c++ T *p2 = **pa2; T *p3 = ***pa3; printf("%p %p %p %p %p %p %p\n", a, pa1, pa2, pa3, p, p1, p2, p3); printf("%d %d %d %d %d %d %d\n", a[5], (*pa1)[5], (*pa2)[2][1], (*pa3)[0][1][2], p[5], p1[5], p2[5], p3[5]); return 0; } 

The above code compiles and runs in C, creating the expected results. All pointer values ​​are the same as all int values. I think the result will be the same for any type T, but int easiest to work with.

I admitted that I was initially surprised that dereferencing a pointer to an array gives an identical pointer value, but when reflected, I think it's just the inverse relationship between the span between the matrices and the pointers that we know and love.

[EDIT: commented out lines cause C ++ errors and warnings in C. I find that standard C is fuzzy at this point, but this is not a real question.]

The question was declared as Undefined Behavior in this, but I do not see it. I'm right?

The code is here if you want to see it.


Immediately after I wrote above, it became clear to me that these errors are due to the fact that in C ++ there is only one level of pointer decomposition. More dereferencing required!

  T *p2 = **pa2; //no error in c or c++ T *p3 = ***pa3; //no error in c or c++ 

And before I managed to finish this editing, @AntonSavin provided the same answer. I changed the code to reflect these changes.

+6
source share
3 answers

This is a C response.

C11 (n1570) 6.3.2.3 p7

A pointer to an object type can be converted to a pointer to another object type. If the resulting pointer is not correctly aligned *) for the reference type, the behavior is undefined. Otherwise, when converting back, the result will be compared with the original pointer.

*) In general, the concept of “correctly aligned” is transitive: if a pointer to type A correctly aligned for a pointer to type B , which, in turn, is correctly aligned for a pointer to type C , then a pointer to type A correctly aligned for a pointer to type C

The standard is a bit vague what happens if we use such a pointer (strict anti-aliasing to the side) for anything else but its conversion, but the intention and widespread interpretation is that such pointers should compare the same (and have the same numerical value, for example , they should also be equal when converting to uintptr_t ), as an example, think of (void *)array == (void *)&array (converting to char * instead of void * explicitly guaranteed).

 T(*pa1)[6] = (T(*)[6])a; 

This is good, the pointer is correctly aligned (same pointer as &a ).

 T(*pa2)[3][2] = (T(*)[3][2])a; // (i) T(*pa3)[1][2][3] = (T(*)[1][2][3])a; // (ii) 

Iff T[6] has the same alignment requirements as T[3][2] and the same as T[1][2][3] , (i) and (ii) are safe, respectively . It seems strange to me that they cannot, but I cannot find guarantees in the standard that they must have the same alignment requirements.

 T *p = a; // safe, of course T *p1 = *pa1; // *pa1 has type T[6], after lvalue conversion it T*, OK T *p2 = **pa2; // **pa2 has type T[2], or T* after conversion, OK T *p3 = ***pa3; // ***pa3, has type T[3], T* after conversion, OK 

Ignoring UB caused by passing int * , where printf expects void * , will consider the expression in arguments for the next printf , first defined:

 a[5] // OK, of course (*pa1)[5] (*pa2)[2][1] (*pa3)[0][1][2] p[5] // same as a[5] p1[5] 

Note that string anti-aliasing is not a problem here, does not have an erroneously entered lvalue value, and we access T as T

The following expressions depend on the interpretation of the arithmetic of pointers outside the boundaries, a more relaxed interpretation (allowing container_of , alignment of the array , also allows "struct hack" with char[] , etc.); a more rigorous interpretation (allowing reliable checking of runtime boundaries for pointer arithmetic and pointer dereferencing, but rejecting container_of , aligning the array (but not necessarily "raising" the array, what you did), hack structure, etc.) makes them undefined:

 p2[5] // UB, p2 points to the first element of a T[2] array p3[5] // UB, p3 points to the first element of a T[3] array 
+2
source

UPDATE: For C ++ only , to scroll C down. In short, there is no UB in C ++ and there is UB in C.

8.3.4/7 says:

For multidimensional arrays, a consistent rule applies. If E is an n-dimensional array of rank i xj x ... xk, then E appearing in the expression that is to be converted from array to pointer (4.2) is converted to a pointer to an (n - 1) -dimensional array with rank j x ... x k. If the * operator explicitly or implicitly as a result of the signature is applied to this pointer, the result is a directional (n - 1) -dimensional array, which itself is immediately converted to a pointer.

Thus, this will not result in an error in C ++ (and will work as expected):

 T *p2 = **pa2; T *p3 = ***pa3; 

Regarding whether it is UB or not. Consider the first conversion:

 T(*pa1)[6] = (T(*)[6])a; 

In C ++, this is actually

 T(*pa1)[6] = reinterpret_cast<T(*)[6]>(a); 

And here is what the standard says about reinterpret_cast :

An object pointer can be explicitly converted to an object pointer of another type. When a v value of type "pointer to T1" is converted to type "pointer to cv T2", the result is static_cast <summary T2 *> (static_cast <cv void *> (v)) if both T1 and T2 are standard layout types (3.9 ) and alignment, the requirements of T2 are not more stringent than the requirements of T1, or if any type is invalid.

So, a converted to pa1 via static_cast to void* and vice versa. It is guaranteed that the static tide void* will return the real address of address a , as specified in 4.10/2 :

A value of type "pointer to cv T", where T is an object type, can be converted to a pointer of type "pointer" to cv void. The result of converting the value of a non-zero pointer to a pointer to an object type is a "pointer to cv void" represents the address of the same byte in memory, as the original value of the pointer.

The following static cast to T(*)[6] again is guaranteed to return the same address as in 5.2.9/13 :

A value of type "pointer to cv1 void" can be converted to a prvalue of type "pointer to cv2 T", where T is the type of object and cv2 is the same cv-qualification as or a higher cv-qualification than cv1. The null pointer value is converted to the null pointer value for the destination type. If the initial value of the pointer represents the address of A byte in memory and A satisfies the alignment requirement T, then the resulting pointer value represents the same address as the original value of the pointer, that is, A

Thus, pa1 guaranteed to point to the same byte in memory as a , and therefore access to the data through it is absolutely right, since the alignment of arrays coincides with the alignment of the base type.

How about C?

Consider again:

 T(*pa1)[6] = (T(*)[6])a; 

C11 6.3.2.3/7 states the following:

A pointer to an object type can be converted to a pointer to another object type. If the resulting pointer is not correctly aligned for the reference type, the behavior is undefined. Otherwise, with the opposite, the result will be compared with the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the low address byte of the object. Successive increments of the result, up to the size of the object, prints pointers to the remaining bytes of the object.

This means that if the conversion does not match char* , the value of the converted pointer will not be guaranteed equal to the value of the original pointer , which will lead to undefined behavior when accessing data through the converted pointer. To make it work, the conversion must be done explicitly via void* :

 T(*pa1)[6] = (T(*)[6])(void*)a; 

Transitions back to T *

 T *p = a; T *p1 = *pa1; T *p2 = **pa2; T *p3 = ***pa3; 

All these are transformations from array of T to pointer to T , which are valid both in C ++ and C, and no UB is started by accessing data through converted pointers.

+2
source

The only reason your code compiles in C is because your default compiler setting allows the compiler to implicitly perform some illegal pointer conversions. Formally, this is not allowed in C. These lines

 T *p2 = *pa2; T *p3 = *pa3; 

poorly formed in C ++ and produce violation of restrictions in C. In a random expression, these lines are errors in both C and C ++.

Any self-respecting C compiler will produce (actually required) diagnostic messages for these constraint violations. The GCC compiler, for example, will issue “warnings” telling you that pointer types in the above initializations are incompatible. Although the “warnings” are sufficient to meet standard requirements, if you really want to use the GCC compiler’s ability to recognize a restriction that violates C code, you should run it with the -pedantic-errors switch and, preferably, explicitly select the standard language version using -std= .

In your experiment, the C compiler performed these implicit conversions for you as a non-standard compiler extension. However, the fact that the GCC compiler running on an ideon front completely suppressed the corresponding warning messages (issued by the standalone GCC compiler even in the default configuration) means that ideone is a broken C compiler. Its diagnostic output cannot be reasonably used to indicate Valid C code from invalid.

As for the conversion itself ... This behavior is not undefined. But undefined behavior for accessing array data through converted pointers.

+2
source

Source: https://habr.com/ru/post/974440/


All Articles