How to get access to m128i_i8 member or to members of __m128i object as a whole?

I understand that Microsoft offers against direct access to members of these objects, but I need to install them, and the documentation is not enough.

I keep getting the error "request for the member 'm128i_i8 in' (my name is var) ', which belongs to the non-class class' wirelabel {aka __vector (2) long long int}", which I do not understand, because I included everything the correct headers and it recognizes the __m128i variables.

Note1: wirelabel is a typedef for __m128i, i.e. exists in title

typedef __m128i wirelabel 

Note2: The reason for using Note1 is explained in the following other question: tbb :: cache_aligned_allocator: Getting "query for member ... which belongs to a nonclass class" with __m128i. User error or error?

Note3: I am using the g ++ compiler

Note4: This next question does not answer me, but discusses the relevant information. Why should you not directly access the __m128i fields?

I also know that there is a _mm_set_epi8 function, but it requires that you immediately install all 8-bit partitions, and this is not an option for me at the moment.

: , , 16 8- m128i, : "bool" n * 128 '(n - size_t), "wirelabel" "n". , wirelabel - /typedef ( , ) __m128i, "n" 128 bools "wirelabel" . , 8 ​​ 8- "wirelabel" . , - .

+4
2

, ? _mm_load_si128 , .


bool (1 ABI, g++ x86) . SIMD, 1 .

pmovmskb (_mm_movemask_epi8) . , , .

, , pmovmskb Haswell ( 0). (http://agner.org/optimize/). 0x7F 0x80 ( ) 1, 0x7F ( ) 0. ( a bool x86-64 V ABI 0 1, 0 ).

pcmpeqb _mm_set1_epi8(1)? Skylake pcmpeqb 0/1, paddb 3 ALU (0/1/5). pmovmskb pcmpeqb/w/d/q.

#include <immintrin.h>
#include <stdint.h>

// n is the number of uint16_t dst elements
// We access n*16 bool elements from src.
void pack_bools(uint16_t *dst, const bool *src, size_t n)
{
     // you can later access dst with __m128i loads/stores

    __m128i carry_to_highbit = _mm_set1_epi8(0x7F);
    for (size_t i = 0 ; i < n ; i+=1) {
        __m128i boolvec = _mm_loadu_si128( (__m128i*)&src[i*16] );
        __m128i highbits = _mm_add_epi8(boolvec, carry_to_highbit);
        dst[i] = _mm_movemask_epi8(highbits);
    }
}

, , dst uint16_t . AVX2 uint32_t. (, combine = tmp1 << 16 | tmp, pmovmskb. , , .)

asm ( gcc7.3 -O3 Godbolt)

.L3:
    movdqu  xmm0, XMMWORD PTR [rsi]
    add     rsi, 16
    add     rdi, 2
    paddb   xmm0, xmm1
    pmovmskb        eax, xmm0
    mov     WORD PTR [rdi-2], ax
    cmp     rdx, rsi
    jne     .L3

, (7 fuse-domain uops → 16 ~ 1,75 ). Clang 2 16 bools 1,5 .

(pslld xmm0, 7) 2 Haswell, 0.

+4

, _m128i , . -punning C g++, clang++ MSVC. , struct . , Intel, .

0

Source: https://habr.com/ru/post/1694827/


All Articles