How to use decimal (float) in C ++?

In accordance with IEEE 754-2008 there is

There are three binary base floating-point formats (which can be encoded using 32, 64 or 128 bits) and two decimal floating-point formats (which can be encoded using 64 or 128 bits).

This graph is below it. In C ++, I believe that float and double have single and double precision (binary32 and binary64). What class / struct can I use for decimalX and is there something I can use for binary128? Are these classes / structures standard or non-standard?

Name Common name Base Digits E min E max Digits E max binary32 Single precision 2 23+1 βˆ’126 +127 7.22 38.23 binary64 Double precision 2 52+1 βˆ’1022 +1023 15.95 307.95 binary128 Quadruple precision 2 112+1 -16382 +16383 34.02 4931.77 decimal32 10 7 βˆ’95 +96 7 96 decimal64 10 16 βˆ’383 +384 16 384 decimal128 10 34 βˆ’6143 +6144 34 6144 
+4
source share
5 answers

In addition to 32-bit float and 64-bit double , GCC offers __float80 , __float128 , _Decimal32 , _Decimal64 , _Decimal128 ; for ARM purposes, it also offers __fp16 .

Intel processors support 80-bit hardware floats using the old x87 scalar FPU instructions (but not with SSE vector instructions). I am not aware of any major hardware processors supporting decimal FP types.

It looks like the current compiler compilation of Microsoft provides 64-bit values ​​for double and long double , but the older ones gave you 80-bit for long double .

See the documentation here:

+7
source

C ++ does not indicate that float should be 32-bit or that double should be 64-bit. It does not even require 8 bits per byte (although there should be at least 8).

[C++11: 3.9.1/8]: There are three types of floating point: float , double and long double . A double type provides at least the same precision as a float , and a long double type provides at least the same precision as a double . A set of values ​​of type float is a subset of a set of values ​​of type double ; a double value set is a subset of a long double value set. The representation of floating point type values ​​is implementation-defined . Integral and floating types are collectively called arithmetic types. Specializations of the standard std::numeric_limits template (18.3) should indicate the maximum and minimum values ​​of each arithmetic type to implement.

See the documentation for your toolchain and platform to find out what its dimensions are. It can support long double , which, in turn, may be what you want.

+6
source

C ++ does not provide decimal types; the only floating point types are float , double and long double .

Also, C ++ does not indicate that they use IEEE754 views or are of a certain size. The only requirement is that double provides at least the same precision as a float , and that long double provides at least the same precision as double .

+5
source

Intel has a decimal floating point library that will work with ICC or GCC on Mac, Linux, HP / UX, or Solaris; or ICC or CL compilers on Windows. This is not as useful as using operators in built-in types. If you use C ++, maybe someone has already written useful classes that override all the necessary operators for this.

+5
source

If you want the convenience of built-in operators, but don’t want to write it yourself, I would recommend checking out the Bloomberg Finance open-source C ++ Libraries on GitHub . In particular, the BDE package contains the IEEE 754 "Decimal 32/64/128" implementation (see bdldfp_decimal.h)

The good thing about this library is that it supports several different implementations of the IEEE 754 backend, including the C99 reference implementation, the decNumber implementation that ships with GCC, and Intel's Intel open source IntelDFP library (see bdldfp_decimalplatform.h ). It also supports custom endian-ness.

0
source

Source: https://habr.com/ru/post/1396898/


All Articles