ARM Neon: How to convert from uint8x16_t to uint8x8x2_t?

I recently learned about vreinterpret {q} _dsttype_srctype casting operator . However, this does not seem to support conversion in the data type described in this link (at the bottom of the page):

Some built-in tools use an array of vector form types:

<type><size>x<number of lanes>x<length of array>_t

These types are treated like regular C structures containing one element named val.

Definition of an exemplary structure:

 struct int16x4x2_t { int16x4_t val[2]; }; 

Do you know how to convert from uint8x16_t to uint8x8x2_t ?

Note that the problem cannot be reliably resolved using union (reading from inactive members leads to undefined behavior. Edit: Only for this is the case for C ++, while it turns out that C allows printing with a kick ) and using casting pointers (violates the rule of strict smoothing).

+5
source share
4 answers

Based on your comments, it seems that you want to perform a bona fide conversion, that is, create a separate new, separate value of a different type. This is a completely different matter than reinterpretation, for example, an introduction to your question suggests what you wanted. In particular, you set variables declared as follows:

 uint8x16_t a; uint8x8x2_t b; // code to set the value of a ... 

and you want to know how to set the value of b so that it is in a sense equivalent to the value of a .

Speaking in C language:

The strict rule of aliases ( C2011 6.5 / 7 ) says:

The object must have a stored value, accessible only with the value of the lvalue expression, which has one of the following types:

  • a type compatible with an efficient object type, [...]
  • a type of aggregate or combination that includes one of the above types among its members [...] or
  • character type.

(An additional emphasis has been added. Other listed options include different, qualified, and different versions of an effective object type or compatible types that are not relevant here.)

Note that these provisions never interfere with access to the value of a , including the value of the member, through the variable a and similarly for b . But do not overlook the use of the term "effective type" - this is where things can be called up in slightly different circumstances. More on this later.

Using a join

C, of ​​course, allows you to do the conversion through an intermediate union , or you can rely on b as a member of the union in the first place to remove the "intermediate" part:

 union { uint8x16_t x1; uint8x8_2_t x2; } temp; temp.x1 = a; b = temp.x2; 

Using a type pointer (to create a UB)

However, although this is not uncommon, C does not allow you to enter a pun through a pointer:

 // UNDEFINED BEHAVIOR - strict-aliasing violation b = *(uint8x8x2_t *)&a; // DON'T DO THAT 

Here you access the value of a , whose effective type is uint8x16_t , through an lvalue of type uint8x8x2_t . Please note that this prohibition is not prohibited, and even, I would not say, dereferencing - it reads the dereferenced value to apply the side effect of the = operator.

Using memcpy()

Now, what about memcpy() ? That's where it gets interesting. C allows access to stored values ​​of a and b through character type lvalues, and although its arguments are declared to be of type void * , this is the only plausible interpretation of how memcpy() works. Of course, his description characterizes him as copying characters. Therefore there is nothing wrong with doing

 memcpy(&b, &a, sizeof a); 

By doing this, you can freely access the value of b through the variable b , as already mentioned. There are aspects that may be problematic in a more general context, but there is no UB.

However , compare this with an externally similar situation in which you want to put the converted value in a space with dynamic allocation:

 uint8x8x2_t *c = malloc(sizeof(*c)); memcpy(c, &a, sizeof a); 

What could be wrong? There is nothing wrong with that, but here you have UB, if after that you try to access the value *c . What for? because the memory for which the c point does not have a declared type, therefore its effective type is the effective type of what was stored in it (if it has an effective type), including if this value was copied into it via memcpy() ( C2011 6.5 / 6 ). As a result, the object to which c is of the effective type uint8x16_t after the copy, while the expression *c is of the type uint8x8x2_t ; the strictest aliasing rule says that access to this object through this lvalue calls UB.

+5
source

It is completely legal in C ++ to type a pun through casting while you only do char* . It is no coincidence that memcpy is defined as working (technically unsigned char* , which is good enough).

Observe the following passage:

For any object (except the subobject of the base class), the copy type T is trivial, regardless of whether the object has a valid value of type T, the base bytes (1.7) that make up the object can be copied to the char array or without the char sign.

42 If the contents of a char or unsigned char array are copied back to the object, the object subsequently retains its original value. [Example:

 #define N sizeof(T) char buf[N]; T obj; // obj initialized to its original value std::memcpy(buf, &obj, N); // between these two calls to std::memcpy, // obj might be modified std::memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type // holds its original value 

- end of example]

Simply put, such copying is an assigned function of std::memcpy . As long as the types you deal with meet the necessary requirements of triviality, they are completely legal.

Strict anti-aliasing does not include char* or unsigned char* - you can use any type of alias with them.

Note that for unsigned ints, you have very explicit freedom here. The C ++ standard requires that they comply with the requirements of standard C. Format C defines the format. The only way traps or something like that can be involved is that your implementation has any padding bits, but ARM doesn't have any 8-bit bytes, 8-bit and 16-bit integers. Thus, for unsigned integers in implementations with zero padding bits, any byte is a valid unsigned integer.

For unsigned integer types other than unsigned char, the object representation bits should be divided into two groups: value bits and padding bits (not necessarily the last one). If the bit is N bit values, each bit should represent a different power 2 between 1 and 2N-1, so objects of this type should be able to represent values ​​from 0 to 2N-1 using a pure binary representation; it should be known as a representation of meaning. The values ​​of any padding bits are undefined.

+6
source

So there is a bunch of gotchas here. This reflects C ++.

First you can convert trivially copied data to char* or unsigned char* or std::byte* , then copy it from one place to another. As a result, behavior is determined. The byte values ​​are not defined.

If you do this from a value of one type to another using something like memcpy , this can lead to undefined behavior when accessing the target type, unless the target type has valid values ​​for all byte representations or if the layout of the two types is specified by your compiler .

There is the possibility of "trap representations" in combinations of the target type - byte, which lead to machine exceptions or something similar if they are interpreted as a value of this type. Imagine a system that does not use IEEE floats and where math on NaN or INF or the like calls segfault.

There are also alignment problems.

In C, I believe that punning type across unions is legal, with similar qualifications.

Finally, note that with strict reading of standard, foo* pf = (foo*)malloc(sizeof(foo)); is not a pointer to foo , even if foo is plain old data. You must create an object before interacting with it, and the only way to create an object outside of automatic storage is through new or placing new . This means that you must have the data of the target type before memcpy into it.

+2
source

Do you know how to convert from uint8x16_t to uint8x8x2_t?

 uint8x16_t input = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; uint8x8x2_t output = { vget_low_u8(input), vget_high_u8(input) }; 

You need to understand that with neon presence, uint8x16_t represents a 16-byte register; while uint8x8x2_t represents two contiguous 8-byte registers. For ARMv7, this may be the same (q0 == {d0, d1}), but for ARMv8 the register location is different. It is necessary to receive (extract) the lower 8 bytes and the highest 8 bytes of one 16-byte register using two functions. The clang compiler will determine which instructions are needed depending on the context.

0
source

Source: https://habr.com/ru/post/1266971/


All Articles