In accordance with IEEE 754-2008 there is
There are three binary base floating-point formats (which can be encoded using 32, 64 or 128 bits) and two decimal floating-point formats (which can be encoded using 64 or 128 bits).
This graph is below it. In C ++, I believe that float and double have single and double precision (binary32 and binary64). What class / struct can I use for decimalX and is there something I can use for binary128? Are these classes / structures standard or non-standard?
Name Common name Base Digits E min E max Digits E max binary32 Single precision 2 23+1 β126 +127 7.22 38.23 binary64 Double precision 2 52+1 β1022 +1023 15.95 307.95 binary128 Quadruple precision 2 112+1 -16382 +16383 34.02 4931.77 decimal32 10 7 β95 +96 7 96 decimal64 10 16 β383 +384 16 384 decimal128 10 34 β6143 +6144 34 6144
user34537
source share