Four Bit AVX Chains

I need to perform the following operation:

 w[i] = scale * v[i] + point

the scale and the point are fixed, while it v[]is a vector of 4-bit integers.

I need to calculate w[]for an arbitrary input vector v[], and I want to speed up the process using the built-in AVX tools. However v[i], this is a vector of four-digit integers.

The question is how to perform operations with 4-bit integers using the built-in functions? I could use 8-bit integers and perform operations this way, but is there any way to do the following:

[a,b] + [c,d] = [a+b,c+d]

[a,b] * [c,d] = [a * b,c * d]

(Ignore overflow)

Using AVX attributes, where [..., ...] is an 8-bit integer and a, b, c, d are 4-bit integers?

, , ?

+4
1

( ) ( AVX2):

uint8_t a, b;          // input containing two nibbles each

uint8_t c = a + b;     // add with (unwanted) carry between nibbles
uint8_t x = a ^ b ^ c; // bits which are result of a carry
x &= 0x10;             // only bit 4 is of interest
c -= x;                // undo carry of lower to upper nibble

, a b 4 unset (.. ), x.

: scale , , , / ( , ). , , 4 16- , . ( AVX 8 , 16- ):

uint16_t m0=0xf, m1=0xf0, m2=0xf00, m3=0xf000; // masks for each nibble

uint16_t a, b; // input containing 4 nibbles each.

uint16_t p0 = (a*b) & m0; // lowest nibble, does not require masking a,b
uint16_t p1 = ((a>>4) * (b&m1)) & m1;
uint16_t p2 = ((a>>8) * (b&m2)) & m2;
uint16_t p3 = ((a>>12)* (b&m3)) & m3;

uint16_t result = p0 | p1 | p2 | p3;  // join results together 
+2

Source: https://habr.com/ru/post/1677294/


All Articles