It is worth noting that if each bit has a certain number of leading zeros added, then adding all the input values ββwill lead to the bitcont of each of them, we just need to mask it or something like that to get it. Then the bit count itself becomes trivial, but it raises other questions, such as:
- How to convert my input to a format that I would like to work on?
- How do I get a bit counter after an operation?
- Should I store the first input in a converted format first?
- What is the maximum bit?
In the code below, I decided to support the maximum number of bits 15, but it could easily be expanded to 255. I decided to consider only a well-formed function input (without empty or too large input arrays). And although the assembly created to access the bit fields by the caller is likely to be associated with some shifts or masks, this is normal.
This implementation uses a lookup table to expand, and although I have not profiled it, I think it should be a little faster than a cyclic bitwise solution.
struct BitCount { unsigned char bit0 : 4; unsigned char bit1 : 4; unsigned char bit2 : 4; unsigned char bit3 : 4; unsigned char bit4 : 4; unsigned char bit5 : 4; unsigned char bit6 : 4; unsigned char bit7 : 4; unsigned char bit8 : 4; unsigned char bit9 : 4; unsigned char bitA : 4; unsigned char bitB : 4; unsigned char bitC : 4; unsigned char bitD : 4; unsigned char bitE : 4; unsigned char bitF : 4; }; void CountBits(const short *invals, unsigned incount, BitCount &bitcount) { assert(incount && incount <= 0xF && sizeof bitcount == 8); static const unsigned expand[256] = { 0x01010100, 0x01010101, 0x01010110, 0x01010111, 0x01011000, 0x01011001, 0x01011010, 0x01011011, 0x01011100, 0x01011101, 0x01011110, 0x01011111, // 5_ struct BitCount { unsigned char bit0 : 4; unsigned char bit1 : 4; unsigned char bit2 : 4; unsigned char bit3 : 4; unsigned char bit4 : 4; unsigned char bit5 : 4; unsigned char bit6 : 4; unsigned char bit7 : 4; unsigned char bit8 : 4; unsigned char bit9 : 4; unsigned char bitA : 4; unsigned char bitB : 4; unsigned char bitC : 4; unsigned char bitD : 4; unsigned char bitE : 4; unsigned char bitF : 4; }; void CountBits(const short *invals, unsigned incount, BitCount &bitcount) { assert(incount && incount <= 0xF && sizeof bitcount == 8); static const unsigned expand[256] = { 0x01100100, 0x01100101, 0x01100110, 0x01100111, 0x01101000, 0x01101001, 0x01101010, 0x01101011, 0x01101100, 0x01101101, 0x01101110, 0x01101111, // 6_ struct BitCount { unsigned char bit0 : 4; unsigned char bit1 : 4; unsigned char bit2 : 4; unsigned char bit3 : 4; unsigned char bit4 : 4; unsigned char bit5 : 4; unsigned char bit6 : 4; unsigned char bit7 : 4; unsigned char bit8 : 4; unsigned char bit9 : 4; unsigned char bitA : 4; unsigned char bitB : 4; unsigned char bitC : 4; unsigned char bitD : 4; unsigned char bitE : 4; unsigned char bitF : 4; }; void CountBits(const short *invals, unsigned incount, BitCount &bitcount) { assert(incount && incount <= 0xF && sizeof bitcount == 8); static const unsigned expand[256] = { 0x10110100, 0x10110101, 0x10110110, 0x10110111, 0x10111000, 0x10111001, 0x10111010, 0x10111011, 0x10111100, 0x10111101, 0x10111110, 0x10111111, // B_ struct BitCount { unsigned char bit0 : 4; unsigned char bit1 : 4; unsigned char bit2 : 4; unsigned char bit3 : 4; unsigned char bit4 : 4; unsigned char bit5 : 4; unsigned char bit6 : 4; unsigned char bit7 : 4; unsigned char bit8 : 4; unsigned char bit9 : 4; unsigned char bitA : 4; unsigned char bitB : 4; unsigned char bitC : 4; unsigned char bitD : 4; unsigned char bitE : 4; unsigned char bitF : 4; }; void CountBits(const short *invals, unsigned incount, BitCount &bitcount) { assert(incount && incount <= 0xF && sizeof bitcount == 8); static const unsigned expand[256] = { 0x11110100, 0x11110101, 0x11110110, 0x11110111, 0x11111000, 0x11111001, 0x11111010, 0x11111011, 0x11111100, 0x11111101, 0x11111110, 0x11111111}; struct BitCount { unsigned char bit0 : 4; unsigned char bit1 : 4; unsigned char bit2 : 4; unsigned char bit3 : 4; unsigned char bit4 : 4; unsigned char bit5 : 4; unsigned char bit6 : 4; unsigned char bit7 : 4; unsigned char bit8 : 4; unsigned char bit9 : 4; unsigned char bitA : 4; unsigned char bitB : 4; unsigned char bitC : 4; unsigned char bitD : 4; unsigned char bitE : 4; unsigned char bitF : 4; }; void CountBits(const short *invals, unsigned incount, BitCount &bitcount) { assert(incount && incount <= 0xF && sizeof bitcount == 8); static const unsigned expand[256] = {
EDIT . You can skip the above table scan on a 64-bit system if you increment the bit counter from 4 to 8 bits, taking advantage of the multiplication.
This is the property that interests us (a, b, c, d is 0 or 1, and n is a binary number):
n * (a * 2 ^ 3 + b * 2 ^ 2 + c * 2 ^ 1 + d * 2 ^ 0) <=> ((a * n) <3) + ((b * n) <2) + ((c * n) <1) + ((d * n) <0)
So, if we transfer the byte to a 64-bit int and carefully select the factor, we can get 0 or 1 in the first bit of each byte of the product. Here is a multiplier:
00000000 00000010 00000100 00001000 00010000 00100000 01000000 10000001
or 0x002040810204081
So, we can expand the byte to 64 bits as follows:
unsigned char b = ... // this operation can be used in substitution of the below look-up table // (if the code is written for 8-bit wide count, instead of 4-bit wide counts) unsigned __int64 valx = ((unsigned __int64)b * 0x002040810204081) & 0x0101010101010101;
Then we can extract as many bits as this in small system systems.
union ResultType { unsigned __int64 result; unsigned char bitcount[8];