Looking at the STL, which comes with gcc 4.0.0, methods of bits _Find_firstand _Find_nextare already doing what you want. In particular, they use __builtin_ctzl()(described here ), which should use the appropriate instruction. (I would suggest that the same applies to older versions of gcc.)
And the best part is that the bitrate is already doing the right thing: one instruction, if it is a bitset that fits into one unsigned long; loop over long if it uses multiple. In the case of a loop, this is a loop whose length is known at compile time, with several instructions, so it can be fully deployed by the optimizer. That is, it would probably be difficult to beat bits, fettering your own.
source
share