Double Options - IEEE 754

According to the following site: http://en.cppreference.com/w/cpp/language/types

"double double precision floating point type. Usually IEEE-754 with a 64-bit floating point type."

It says "usually." What other possible formats / standards can C ++ double ? Which compiler uses an alternative to the IEEE format? Or architecture?

+6
source share
4 answers

Vaxen, Crays, and IBM mainframes to name just a few, which are still widely used. Most (all?) Of them can also perform IEEE floating point, but sometimes only with a special add-in. In other (IBM) cases, IEEE arithmetic can carry significant speed limits.

As with older machines, most mainframes (Unisys, Control Data, etc.) used unique floating-point formats, most of which were not even similar to IEEE, not to mention that they really match.

+6
source

For a short history lesson, you can check out Intel Floating Point Case Study .

Intel compilers have an option that is enabled by default during optimization, which allows you to use the so-called fast-math function . This makes math much faster, but does not comply with strict IEEE standards. You can follow strict standard compliance with the fp-model option.

I believe that the CUDA language for the NVidia GPU also has a significantly faster math library if you are ready to refuse strict compliance with the IEEE standard. This not only speeds up the math, but also reduces the number of registers used for transcendental functions in particular.

Whether compliance is required depends on each individual case. We had problems with Intel optimization and had to enable the fp-model strict parameter to ensure correct math results with double precision.

+3
source

It seems most computers today use the IEEE-754. But alternatives seem to have been available before. Previously used formats such as excess 128 and packaged BCD ( http://aplawrence.com/Basics/floatingpoint.html ). The Wikipedia entry also has a few listed http://en.wikipedia.org/wiki/Floating_point

+2
source

It’s probably worth adding that in response to the question “What other possible formats / standards can C ++ use?” That gcc for Atmel AVR (which are 8-bit data CPUs used in some Arduinos) does not implement double as 64 bit.

See the GCC wiki, the avr-gcc page and, in particular, the “double” subsection, “Deviations from the standard,” which says

double has a width of only 32 bits and is implemented in the same way as float

I believe that other processors have similar implementations, but I could not find them.

0
source

Source: https://habr.com/ru/post/909206/


All Articles