Find all 2-bit values ​​that match another binary pattern and then sum them up

First value:

I have a binary value, which is actually a compact series of 2-bit values. (That is, every 2 bits in a binary value represents 0, 1, 2, or 3.) So, for example, 0, 3, 1, 2 becomes 00110110. In this binary string, all I care about is 3 (or alternately I could flip the bits and only care about 0 if that makes your answer easier). All other figures are irrelevant (for reasons that we complicate a bit).

Second value:

I have a second binary value, which is also a compacted series of two-bit values ​​represented in the same way. It has the same length with the first value.

Maths:

I want the sum of 2-bit numbers in the second value to have the same position as 3 from the first value. In other words, if I have:

First: 11000011 Second: 01111101 

Then my answer will be “2” (I added the first number and the last number from the “Second” together, because those were the only ones that had “11” in the first value that corresponded to them.)

I want to do this in a few clock cycles (both on the GPU and on the x86 architecture). However, I'm generally looking for an algorithm, not an assembler solution. Is there a faster way than masking two bits at a time from each number and starting multiple loops?

+6
source share
1 answer

Of course.

  // the two numbers unsigned int a; unsigned int b; 

Now create a mask from the one that contains the bit “1”, in the odd position, only if it was “11” ending in the same position.

  unsigned int mask = a & (a >> 1) & 0x55555555; 

Expand it to return the pattern "11":

  mask = mask | (mask << 1); 

So, if a was 1101100011, the mask is 1100000011.

Then mask b using the mask:

  b = b & mask; 

Then you can add (masked) numbers from b in parallel:

  b = (b & 0x33333333) + ((b & 0xcccccccc) >> 2); b = (b & 0x0f0f0f0f) + ((b & 0xf0f0f0f0) >> 4); b = (b & 0x00ff00ff) + ((b & 0xff00ff00) >> 8); b = (b & 0x0000ffff) + ((b & 0xffff0000) >> 16); 

For a 32-bit number, the sum is now on the low-order bits of b. This is a well-known template for adding bit fields in parallel. For more than 32-bit numbers, you would add one more round for 64-bit numbers and two rounds for 128-bit numbers.

+11
source

Source: https://habr.com/ru/post/915552/


All Articles