Why is there a cost loss when converting from int to float in the code below?

int value1 = 123456789; float value2 = value1; System.out.println(value1); System.out.println(value2); 

Output:

123456789
123456792

+4
source share
1 answer

The float type uses the same number of bits as int (32 bits) to represent floating point numbers in a larger range than int uses to represent only integers.

This leads to a loss of precision, since not every int number can be accurately represented by the float character. Only 24 bits are used to represent the partial part of the number (including the signed bit), while the remaining 8 are used to represent the exponent.

If you set this int value to double , then there will be no loss of precision, since double has 64 bits, and more than 32 of them are used to represent the fraction.

Here is a more detailed explanation:

Binary representation of 123456789 as an int:

 00000111 01011011 11001101 0001 0101 

A single-precision floating-point number is constructed from 32 bits using the following formula :

 (-1)^sign * 1.b22 b21 ... b0 * 2^(e-127) 

Where sign is the most significant bit (b31). b22 - b0 - bit bits, and bits b30 - b23 - exponent e.

Therefore, when you convert int 123456789 to float , you can only use the following 25 bits:

 00000111 01011011 11001101 00010101 - --- -------- -------- ----- 

We can safely get rid of any leading zeros (except the sign bit) and any trailing zeros. This gives you the 3 least significant bits that we need to reset. We can either subtract 5 to get 123456784:

 00000111 01011011 11001101 00010000 - --- -------- -------- ----- 

or add 3 to get 123456792:

 00000111 01011011 11001101 00011000 - --- -------- -------- ----- 

Obviously, Appendix 3 gives a better approximation.

+6
source

Source: https://habr.com/ru/post/1013685/


All Articles