Confusion about bsr and lzcnt

I got a little confused in both instructions. First, discard the special case where the scanned value is 0 and the result is undefined / bsr or bitize / lzcnt - this difference is clear and not part of my question.

Let us take the binary value 0001 1111 1111 1111 1111 1111 1111 1111

According to Intel specification, the result for lzcnt is 3

According to Intel specification, the result for bsr is 28

lzcnt counts, bsr returns the index or distance from bit 0 (this is lsb).

How can both commands be the same and how can lzcnt be emulated as bsr if there is no BMI on the CPU? Or bit 0 in case of bsr msb? The two “code actions” in the Intel specification are also different, one counting or indexes on the left and the other on the right.

Maybe someone can shed some light on this, I don’t have a processor without a BMI/lzcnt instruction to check if returning to bsr with the same result (since a special case of value 0 for scanning will never happen).

+5
source share
2 answers

LZCNT sets the number of leading zero bits. BSR gives the most significant 1 bit index index. Therefore, they do the same for the nonzero case, except that the result is interpreted differently. Therefore, you can simply subtract the result of BSR from 31 to get the same behavior as with LZCNT , i.e. LZCNT == (31 - BSR) .

+6
source

To be clear, there is no working response from lzcnt to bsr . It so happened that Intel used the previously redundant rep bsr to encode the new lzcnt instruction. The use of the redudant rep prefix for bsr usually defined as ignored, but with the caveat that it can decode differently on future processors 1 .

So, if you run lzcnt on a CPU that does not support it, it will execute as bsr . Of course, this reserve is not exactly deliberate, and it gives the wrong result (as Paul R. points out, they look at the same bit, but report it differently): this is simply a consequence of how the new instruction was encoded and how pointless rep prefixes were handled by previous processors. Thus, the world reserve is practically not suitable for lzcnt and bsr .

The situation is more subtle for tzcnt and bsf . It uses the same encoding trick: tzcnt has the same encoding as rep bsf , but here the “fallback” basically works, since tzcnt returns the same value as bsf for all inputs except zero. For null inputs, tzcnt returns 32, but bsf leaves destination undefined.

You can't even use this reserve: if you never had zero inputs, you could just use bsf , saving bytes and being compatible with several decades of processors, and if you have zero inputs the behavior is different.

Thus, behavior may be better classified as little things than a reserve ...


1 Usually this would be more or less esoteric, but you could, for example, use rep prefixes, where they have no functional effect to extend instructions to help align subsequent code without inserting an explicit nop instruction. Given that "it may be differently decoded in the future," this would be dangerous when compiling code that could run on any future processor.

+4
source

Source: https://habr.com/ru/post/1201887/


All Articles