If / else in SSE intrinsics

I'm trying to optimize a small piece of code using the built-in SSEs (I'm a complete newbie to this topic), but I'm a little fixated on using conditional expressions.

My source code:

unsigned long c; unsigned long constant = 0x12345678; unsigned long table[256]; int n, k; for( n = 0; n < 256; n++ ) { c = n; for( k = 0; k < 8; k++ ) { if( c & 1 ) c = constant ^ (c >> 1); else c >>= 1; } table[n] = c; } 

The purpose of this code is to compute the crc table (a constant can be any polynomial, it does not matter here),

I suppose my optimized code would be something like this:

 __m128 x; __m128 y; __m128 *table; x = _mm_set_ps(3, 2, 1, 0); y = _mm_set_ps(3, 2, 1, 0); //offset for incrementation offset = _mm_set1_ps(4); for( n = 0; n < 64; n++ ) { y = x; for( k = 0; k < 8; k++ ) { //if do something with y //else do something with y } table[n] = y; x = _mm_add_epi32 (x, offset); } 

I don't know how to go through the if-else statement, but I suspect there is a smart trick. Does anyone know how to do this?

(Besides this, my optimization is probably pretty bad - any advice or corrections on it will be considered with the greatest sympathy)

+6
source share
3 answers

You can completely get rid of if / else. Back in the days when I released the assembly MMX code, it was a common programming activity. Let me start with a series of transformations in the “false” statement:

 c >>= 1; c = c >> 1; c = 0 ^ (c >> 1); 

Why did I introduce an exclusive or? Because exclusive or also found in the "true" statement:

 c = constant ^ (c >> 1); 

Pay attention to the similarities? In the "true" part, we have xor with a constant, and in the false part - xor with zero.

Now I will tell you about a series of transformations for the entire if / else statement:

 if (c & 1) c = constant ^ (c >> 1); // same as before else c = 0 ^ (c >> 1); // just different layout if (c & 1) c = constant ^ (c >> 1); else c = (constant & 0) ^ (c >> 1); // 0 == x & 0 if (c & 1) c = (constant & -1) ^ (c >> 1); // x == x & -1 else c = (constant & 0) ^ (c >> 1); 

Now the two branches differ only from the second argument in binary - and which can be calculated trivially from the condition itself, which allows us to get rid of if / else:

 c = (constant & -(c & 1)) ^ (c >> 1); 

Disclaimer: This solution only works in a two-component architecture, where -1 means "all bits are set."

+12
source

The idea in SSE is to collect both results and then combine the results together.

eg.

 __m128i mask = ...; // some way to build mask[n] = 0x1 __m128i constant = ...; __m128i tmp_c = _mm_xor_si128( _mm_srli_epis32( c, 1 ), constant ); __m128i tmp_c2 = _mm_srli_epis32( c, 1 ); __m128i v = _mm_cmpeq_epi32( c, mask ); tmp_c = _mm_and_epi32( tmp_c, mask ); tmp_c2 = _mm_andnot_si128( mask, tmp_c2 ); c = _mm_or_si128( tmp_c, tmp_c2 ); // or in sse4_1 c = _mm_blendv_epi8( tmp_c, tmp_c2, mask ); 

Please note that this is not a complete code, only to demonstrate the principle.

+2
source

The first step in effective CRC calculation is to use a wider base unit than a bit. See here for an example of how to make this byte for each byte.

+1
source

Source: https://habr.com/ru/post/890126/


All Articles