First value:
I have a binary value, which is actually a compact series of 2-bit values. (That is, every 2 bits in a binary value represents 0, 1, 2, or 3.) So, for example, 0, 3, 1, 2 becomes 00110110. In this binary string, all I care about is 3 (or alternately I could flip the bits and only care about 0 if that makes your answer easier). All other figures are irrelevant (for reasons that we complicate a bit).
Second value:
I have a second binary value, which is also a compacted series of two-bit values represented in the same way. It has the same length with the first value.
Maths:
I want the sum of 2-bit numbers in the second value to have the same position as 3 from the first value. In other words, if I have:
First: 11000011 Second: 01111101
Then my answer will be “2” (I added the first number and the last number from the “Second” together, because those were the only ones that had “11” in the first value that corresponded to them.)
I want to do this in a few clock cycles (both on the GPU and on the x86 architecture). However, I'm generally looking for an algorithm, not an assembler solution. Is there a faster way than masking two bits at a time from each number and starting multiple loops?
source share