Basic data structure for float in python

The question arises regarding the underlying data structure of float (and precision) in Python:

>>> b = 1.4 + 2.3 >>> b 3.6999999999999997 >>> c = 3.7 >>> c 3.7000000000000002 >>> print b, c 3.7 3.7 >>> b == c False 

it seems that the values ​​of b and c depend on the machine, these are numbers that are closest to the target values, but not exactly the same numbers. I was controlled that we got the “correct” numbers from “Print”, and someone told me that it was because the print was “lying”, while Python decided to tell us the truth, that is, showing exactly what they were storing .

And my questions are:

1. How to lie? for example, in a function, we take two values ​​and return if they are the same, how could I best suggest if the number of decimal (precision) is unknown? like b and c mentioned above? is there a well defined algorithm for this? I was told that every language (C / C ++) will have such a problem if we have a floating point calculation, but how do they "solve" this?

2. Why can't we just store the actual number instead of storing the nearest number? is it limiting or trading for efficiency?

thank you so much John

+6
source share
6 answers

To answer your first question, see the following (slightly compressed) code from a Python source:

 #define PREC_REPR 17 #define PREC_STR 12 void PyFloat_AsString(char *buf, PyFloatObject *v) { format_float(buf, 100, v, PREC_STR); } void PyFloat_AsReprString(char *buf, PyFloatObject *v) { format_float(buf, 100, v, PREC_REPR); } 

Basically, repr(float) will return a string formatted with 17 precision digits, and str(float) will return a string with 12 precision digits. As you might have guessed, print uses str() , and typing a variable name in the interpreter uses repr() . With only 12 digits of accuracy, it looks like you are getting the “correct” answer, but that’s only because what you expect and the actual value are up to 12 digits.

Here is a brief example of the difference:

 >>> str(.1234567890123) '0.123456789012' >>> repr(.1234567890123) '0.12345678901230001' 

As for your second question, I suggest you read the following section of the Python manual: Floating-point arithmetic: problems and limitations

This boils down to efficiency, reduced memory, and faster floating point operations when you store base decimal decimal numbers 10 in base 2 than any other representation, but you need to deal with inaccuracies.

As JBernardo noted in the comments, this behavior is different in Python 2.7 and later, the following quote from the tutorial link above describes the difference (using 0.1 as an example):

In versions prior to Python 2.7 and Python 3.1, Python rounded this value to 17 significant digits, giving "0.10000000000000001. In current versions, Python displays the value based on the shortest decimal fraction, which will correctly return to the true binary value, resulting in just 0.1.

+7
source

You should read the infamous paper:

What Every Computer Scientist Should Know About Floating-Point Arithmetic

Click on the link that says “CACHED” to download the PDF.

+2
source

As a result, you get a different result, because the numbers 1.4 and 2.3 are also not represented exactly. When you add them, you also accumulate your precision limitations.

All floating point numbers have limited precision, and because floating point numbers are usually represented domestically (using base 2 rather than base 10), the restrictions apply to numbers that we humans consider frivolous.

Limited accuracy is rarely a problem for computing since accuracy is still sufficient for most applications. When comparing floating point numbers, on the other hand, limited accuracy should be considered.

This is usually done by subtracting numbers and checking if the difference is small enough compared to numbers.

So, for example, if:

 abs(b - c) < abs(b) / 1000000000000 

then you can consider them equal. How many digits you want to consider depends on the precision of the floating point number, i.e. If you use single or double precision numbers and what calculations you did to achieve the numbers. Since precision limits accumulate with each calculation, you may need to lower the threshold if they are considered equal.

When displaying a floating-point number, it is rounded to precision. If, for example, it is able to accurately display 15 digits, it can be rounded to 13 digits before displaying.

Floating point numbers are designed for quick calculations. There are other data types, such as Decimal, that can accurately store a number. They are used, for example, to store currency values.

+1
source

Floating point numbers are inaccurate; this is the face of the presentation method. There is a lot of feedback on why this is happening; suffice it to say that this is a problem on almost any platform that provides floating point numbers.

The best way to deal with inaccuracies is to have a confidence interval; that is, comparing two calculated floats for equivalence can be problematic because the views can be turned off by a tiny amount, so the way to deal with this is to subtract two of them and make sure that the difference is not more than a small amount. Many libraries already have this functionality built-in for floats, but it’s not particularly difficult to implement yourself when in doubt.

0
source

This lecture is a pretty good idea of ​​how variables are stored in memory, and the professor includes an example that will give unexpected results that you see.
http://www.youtube.com/watch?v=jTSvthW34GU If you need to compare numbers by dropping them as integers, you will notice that they are equal if you perform the test.

0
source

All numbers are stored on a limited number of bits, so you can’t just save the actual number and live with the closest number (imagine 1/3 fraction, if you want to save it on paper using decimal numbers, you will exhaust the world's tree resources). An alternative is a symbolic representation, which you can find, for example, in Mathematica, which simply stores 1/3 as 1 and 3 , but it is far from the machine and makes calculations slower and more complicated.

Take a look at some of the links people post here and read about floating point numbers ... it's a little scary, although you will no longer trust cars.

0
source

Source: https://habr.com/ru/post/893023/


All Articles