Separate the sign, exponent and mantis in it

I read a few topics that already split doubles and โ€œunite them,โ€ but I'm trying to break it down into basic components. While I beat the beat:

breakDouble( double d ){ long L = *(long*) &d; sign; long mask = 0x8000000000000000L; if( (L & mask) == mask ){ sign = 1; } else { fps.sign = 0; } ... } 

But I'm pretty confused about how to get the exponent and the mantissa. I left with forcing the double to the long, because only the leading bit mattered, so truncation did not play a role. However, with other parts, I donโ€™t think this will work, and I know that you cannot do bitwise operators on floats, so I am stuck.

Thoughts?


edit: of course, as soon as I post it, I find it, but I'm not sure if in this case the different floats are doubled.


Edit 2 (sorry for the work when I leave): I read this post that I linked in editor 1, and it seems to me that I can perform the operations that they perform on my double the same path, with masks for the exponent:

 mask = 0x7FF0000000000000L; 

and for the mantissa:

 mask = 0xFFFFFFFFFFFFFL; 

Is it correct?

0
source share
1 answer

The mask bit that you posted in the second rule looks correct. However, you should know that:

  • Dereferencing (long *)&mydouble , like you, is a violation of anti-aliasing C. This still flies in most compilers if you pass the gcc -fno-strict-aliasing flag, but this can lead to problems if you don't. You can use char * and look at bits in this way. This is more annoying and you need to worry about the content, but you do not risk that the compilers ruined everything. You can also create a type of join, such as the one at the bottom of the message, and write it to member d when reading from the other three.

  • A small note on portability: long not the same size everywhere; maybe try using uint64_t instead? ( double isn't there either, but it's pretty clear that this is intended to apply only to IEEE double s.)

  • Bitmask cheating only works for so-called โ€œnormalโ€ floating point numbers --- those that have a biased metric that is not zero (indicating subnormal) or 2047 (indicating infinity or NaN).

  • As Raymond Chen points out, the frexp function does what you really want. frexp handles subormalous, infinite, and NaN events in a documented and normal way, but you pay for using it.

(Apparently, there should be some text without a list between the list and the code block. Here it is: eat it, markdown!)

 union doublebits { double d; struct { unsigned long long mant : 52; unsigned int expo : 11; unsigned int sign : 1; }; }; 
+4
source

Source: https://habr.com/ru/post/1274701/


All Articles