First of all, since your code does 1.0/a , it gives you double ( 1.0 is the double value, 1.0f is float ), because C ++ (and C) rules always expand the smaller type to the larger one if the operands of the operation have different size (therefore int + char makes char into int before adding values, long + int will make int long, etc. etc.).
Second floating point values ββhave a given number of bits for the "number". In float, that is 23 bits (+ 1 'hidden bit), and in double - 52 bits (+1). However, get about 3 digits per bit (exactly: log2 (10) if we use a decimal numeric representation), so a 23-bit number gives about 7-8 digits, a 53-bit number is about 16-17 digits. The rest is just βnoiseβ caused by the last few bits of a number that does not go beyond conversion to decimal.
To have infinite accuracy, we need to either save the value as a fraction, or have an infinite number of bits. And, of course, we could have some other final accuracy, for example, 100 bits, but I'm sure you will complain about it too, because it will just have about 15 more digits before it "goes wrong."
source share