Reinterpret_cast unsigned char * as uint64_t * is UB?

Suppose we take a very large array of unsigned char s.

 std::array<uint8_t, 100500> blob; // ... fill array ... 

(Note: it is already aligned, the question is not about alignment). Then we take it as uint64_t[] and try to access it:

 const auto ptr = reinterpret_cast<const uint64_t*>(blob.data()); std::cout << ptr[7] << std::endl; 

Going to uint64_t and then reading from it looks suspicious, as it does for me.

But UBsan, -Wstrict-aliasing does not start about it. Google uses this method in FlatBuffers . In addition, Cap'n'Proto uses this.

Is this behavior undefined?

+5
source share
2 answers

You cannot access the value of an unsigned char object through a gl value of another type. But permission is allowed, you can access the value of any object using unsigned char glvalue [basic.lval] :

If a program tries to access the stored value of an object using a glvalue other than one of the following types, the behavior is undefined: [...]

  • a char , unsigned char or stdโ€‹::โ€‹byte .

So, to be 100% standard, the idea is to cancel reinterpret_cast :

 uint64_t i; std::memcpy(&i, blob.data() + 7*sizeof(uint64_t), sizeof(uint64_t)); std::cout << i << std::endl; 

And he will create the same assembly .

+6
source

The listing itself is defined correctly (a reinterpret_cast never has UB), but converting lvalue to rvalue in the expression " ptr[7] " would be UB if the uint64_t object was not created at this address.

Since " // ... fill array ... " is not displayed, a uint64_t object could be constructed at this address (assuming that, as you say, the address has sufficient alignment):

 const uint64_t* p = new (blob.data() + 7 * sizeof(uint64_t)) uint64_t(); 

If a uint64_t object was created at this address, then the corresponding code has a well-defined behavior.

+1
source

Source: https://habr.com/ru/post/1274858/


All Articles