Floating point representations seem to do integer arithmetic correctly - why?

Question

Floating point representations seem to do integer arithmetic correctly - why?

I played a bit with floating point numbers and based on what I learned about them in the past, the fact that 0.1 + 0.2 ends with something like 0.30000000000000004 does not surprise me.

What surprises me, however, is that integer arithmetic always works just fine and does not have any of these artifacts.

I first noticed this in JavaScript (Chrome V8 in node.js):

 0.1 + 0.2 == 0.3 // false, NOT surprising 123456789012 + 18 == 123456789030 // true 22334455667788 + 998877665544 == 23333333333332 // true 1048576 / 1024 == 1024 // true

C ++ (gcc on Mac OS X) seems to have the same properties.

The end result is that integers are simple - due to the lack of a better word - work. It is only when I start using decimal numbers that everything becomes awkward.

Is this a design feature, a mathematical artifact, or some kind of optimization performed by compilers and runtimes?

+4

c ++ javascript floating-point

Marcwan Oct 25 '12 at 8:37

source share

8 answers

Is this a design feature, a mathematical artifact, or some kind of optimization performed by compilers and runtimes?

This is a feature of real numbers. A theorem from modern algebra (modern algebra, not secondary school algebra, mathematical specialties take a class in modern algebra after their main calculus and classes of linear algebra) says that for some positive integer b, any positive real number r can be expressed as r = a * b ^p where a is in [1, b), and p is some integer. For example, 1024 ₁₀ = 1.024 ₁₀ * 10 ³ . It is this theorem that justifies our use of scientific notation.

This number a can be classified as terminal (for example, 1.0), repeating (1/3 = 0.333 ...) or not repeating (representation pi). There is a slight problem with terminal numbers. Any terminal number can also be represented as a repeating number. For example, 0.999 ... and 1 is the same number. This ambiguity in the presentation can be resolved by indicating that the numbers that can be represented as terminal numbers are presented as such.

What you discovered is a consequence of the fact that all integers have a terminal representation in any database.

There is a problem with the way the realities are presented on the computer. Just as int and long long int do not represent all integers, float and double do not represent all reals. The scheme used on most computers to represent a real number r should be represented as r = a * 2 ^p but with a mantissa (or significant) truncated to a certain number of bits, and the exponent p is limited to some finite number. This means that some integers cannot be represented exactly. For example, although googol (10 ¹⁰⁰ ) is an integer, this floating point representation is not exact. The basic representation of googol is a 333-bit number. This 333-bit mantissa is truncated to 52 + 1 bits.

The consequence of this is that double-precision arithmetic is no longer accurate, even for integers, if integers are greater than 2 ⁵³ . Try the experiment using the unsigned long long int with values between 2 ⁵³ and 2 ⁶⁴ . You will find that double precision arithmetic is no longer accurate for these large integers.

+4

David hammen Oct 25 '12 at 10:16

source share

Integers with a representable range are exactly represented by the machine, floats are not (well, most of them).

If by “basic integer math” you understand “function”, then yes, you can assume that the correct implementation of arithmetic is a function.

+2

Luchian grigore Oct 25 '12 at 8:40

source share

The reason is because you can represent each integer (1, 2, 3, ...) exactly in binary format (0001, 0010, 0011, ...)

This is why integers are always correct, because 0011 - 0001 is always 0010. The problem with floating point numbers is that the part after the period cannot be exactly converted to binary.

+2

LuigiEdlCarno Oct 25 '12 at 8:41

source share

All cases that you say “work” are those in which the numbers you have indicated can be represented exactly in floating point format. You will find that adding 0.25 and 0.5 and 0.125 works exactly because they can also be represented exactly in binary floating point numbers.

these are only values that cannot be like 0.1, where you get what seems like an inaccurate result.

+1

jcoder Oct 25 '12 at 8:40

source share

Integers are accurate, because inaccuracy arises mainly from the way we write decimal fractions, and secondly, because many rational numbers simply do not have unique representations in any given base.

See fooobar.com/questions/1105739 / ... for a full explanation.

+1

Digitaloss Nov 16 '12 at 1:37

source share

This method only works when you add a sufficiently small integer to a very large integer - and even then you do not represent both integers in floating point format.

0

Aki suihkonen Oct 25 '12 at 8:44

source share

All floating point numbers cannot be represented. this is because of the way they are encoded. The wiki page explains this better than me: http://en.wikipedia.org/wiki/IEEE_754-1985 . Therefore, when you try to compare a floating point number, you should use delta:

 myFloat - expectedFloat < delta

You can use the smallest representable floating point number as delta.

-1

Patrice bernassola Oct 25 '12 at 8:44

source share

Andrey · Accepted Answer · 2012-10-25T09:13:38+0000

I am writing that under the assumption that Javascript uses a double-precision floating-point representation for all numbers.

Some numbers have an exact representation in floating point format, in particular, all integers, such as |x| < 2^53 |x| < 2^53 . In some numbers, in particular, there are no fractions, such as 0.1 or 0.2, which become infinite fractions in binary representation.

If all operands and the result of the operation have an exact representation, then it would be safe to compare the result with == .

Floating point representations seem to do integer arithmetic correctly - why?

More articles: