Why choose one method of counting bits after another? Well, it really depends on your machine and the problem you are trying to solve. Please note that all of the instructions below are for the basic RISC processor and may not translate well to a more complex beast such as x86.
The HAKMEM algorithm you specified will execute in 13 instructions, but is unlikely to be very fast due to the module operator. By stroking it, it looks like it has a pretty good level of parallelism commands, which should help if your processor is capable of using it.
Bo Persson's algorithm is presented quite quickly ( 2 + 5*pop(x) instructions), but only if the word is sparsely populated. It can also be modified to work with populous words. It also contains branches and does not have a significant level of parallelism commands.
EDIT: The table lookup method can also be very fast, but does access memory. If the whole table is in L1 cache, then this is probably one of the fastest algorithms. If the table is not in the cache, it is almost certainly one of the slowest.
Below is the algorithm of one of the HAKMEM algorithms and is presented in the Hacker Delight book (I highly recommend this book if you are like these things). It is executed in 19 instructions and has no branch. He also does not use division, but has multiplication. It is also very economical in the way it uses registers by reusing the same mask as much as possible. There is still no significant level of parallelism command here that I see.
int pop(unsigned x) { unsigned n; n = (x >> 1) & 0x77777777; x = x - n; n = (n >> 1) & 0x77777777; x = x - n; n = (n >> 1) & 0x77777777; x = x - n; x = (x + (x >> 4)) & 0x0F0F0F0F; x = x * 0x01010101; return x >> 24; }
The Hacker Delight book also presents several even more specialized algorithms for fields 9-8-7 bits wide or using floating point operators. Note that most of the analysis I presented was also partially taken from this book.
The fact is that the load on the truck methods and the only way to make sure that works best in your specific situation is to measure and compare. I really understand that this is a pretty complete answer, but the alternative is to know your processor and compiler inside out.
source share