What is the largest denormalized and normalized number? (64 bit, IEE 754-1985)

Question

What is the largest denormalized and normalized number? (64 bit, IEE 754-1985)

I struggle with floating point arithmetic because I really want to understand this topic!

I know that numbers can be represented in scientific notation.

So, for both numbers, the indicator should look like this:

Denormalized number: 11 .... 11 so (1 + 1/2 + 1/2 ^ 2 + ... + 1/2 ^ 52) * 2 ^ 1023

Normalized number: 11 .... 11 so (1 + 1/2 + 1/2 ^ 2 + ... + 1/2 ^ 52) * 2 ^ 1024

However, I'm not sure if this is correct?

I really would appreciate your reply!

PS: In wikipedia, the number given! However, I do not know how they came up with this ...

+6

floating-point binary ieee-754 floating-accuracy denormalization

mrquad Nov 19 '13 at 7:14

source share

1 answer

Jeffrey sax · Accepted Answer · 2013-12-14T20:41:30+0000

As you know, the double-precision format is as follows:

enter image description here

The key to understanding denormalized numbers is that they are not actually floating point numbers, but instead use a fixed-point microformat, using representations that are not used in the “normal” format.

Normal floating-point numbers have the form: m*2^e , where e is determined by subtracting the offset from the exponent field above, and m are numbers between 1 and 2, where the bit after the binary point is given by the above fraction. 1 before the binary point It is not saved because it is always known to be 1. The exponent field has a value from 1 to 2046. The values 0 (all zeros) and 2047 (all) are reserved for special purposes.

Everything in the exponent field means that we have either infinity or NaN (Not-a-Number).

All zeros mean we are dealing with denormal floating point numbers. They still have the same shape, m*2^e , but the values of m and e are displayed differently. m now a number from 0 to 1, so before the binary point is 0 instead of 1 for normal numbers. e always has the same meaning: -1022. So the metric is a constant, so I called it a fixed-point format before.

Thus, the largest possible values for each of them are:

Normal: (1 + 1/2 + 1/2 ^ 2 + ... + 1/2 ^ 52) * 2 ^ 1023 = (2-2 ^ -52) * 2 ^ 1023 = 1.797 ... + 308
Denormal: (0 + 1/2 + 1/2 ^ 2 + ... + 1/2 ^ 52) * 2 ^ -1022 = (1-2 ^ -52) * 2 ^ -1022 = 2.225 ... e -308

What is the largest denormalized and normalized number? (64 bit, IEE 754-1985)

More articles: