Rounding floating-point numbers after adding (protective, sticky and round bits)

I have not yet been able to find a good explanation for this anywhere on the Internet, so I hope someone here can explain it to me.

I want to add two binary numbers manually:

1.001 2 * 2 2
1.010.0000.0000.0000.0000.0011 2 * 2 1

I can’t add them to the problem, I get the following result after de-normalizing the first number, adding two and normalizing them.

1.1100,0000,0000,0000,0000,0011 2 * 2 2

The problem is that this number will not fit into the IEEE 754 format with one precision without trimming or rounding one bit. In my assignment, we ask that we put this number in IEEE 754 format with one precision (which again, as a rule, is not a problem, I can do it easily). He asks us to do this first with protective, round and sticky bits, and then repeat without these bits. However, I'm not quite sure how these bits help with rounding. I would suggest that I would just trim the last LSB if I did this without the protective, round and sticky bits.

+6
source share
1 answer

Single precision means the mantissa contains 23 bits (assuming 32-bit architecture) plus hidden. Therefore, the first disappears from the mantissa.

Next, define the G and R bit or the Guard and Round bit.

The Guard bit is the first of two bits for the 0 mantissa bit to be trimmed.

The round bit is the second bit after the mantissa bit. The protection bit here is 1, and the round bit is zero, since no other bit is present.

The sticky bit is also zero, since there is not one to the right of the round bit. Therefore, we have GRS or 100.

Depending on the book or processor used, this usually means rounding to the nearest even number. In this case, since the least significant bit (the least significant bit) is 1, the number will be rounded to 1100.0000.0000.0000.0000.010 for the mantissa.

+5
source

Source: https://habr.com/ru/post/955140/


All Articles