Portable way to serialize float as a 32-bit integer

I struggled with finding a portable way to serialize 32-bit floating point variables in C and C ++ to send to and from microcontrollers. I want the format well defined so that serialization / de-serialization can also be done from other languages ​​without much effort. Related questions:

Double / float binary serialization portability in C ++

Serialize double and float with C

C ++ portable long to double conversion

I know that in most cases typecast union / memcpy will work fine because the float view is the same, but I would rather have a little more control and skill. So far I have come to the following:

void serialize_float32(uint8_t* buffer, float number, int32_t *index) { int e = 0; float sig = frexpf(number, &e); float sig_abs = fabsf(sig); uint32_t sig_i = 0; if (sig_abs >= 0.5) { sig_i = (uint32_t)((sig_abs - 0.5f) * 2.0f * 8388608.0f); e += 126; } uint32_t res = ((e & 0xFF) << 23) | (sig_i & 0x7FFFFF); if (sig < 0) { res |= 1 << 31; } buffer[(*index)++] = (res >> 24) & 0xFF; buffer[(*index)++] = (res >> 16) & 0xFF; buffer[(*index)++] = (res >> 8) & 0xFF; buffer[(*index)++] = res & 0xFF; } 

and

 float deserialize_float32(const uint8_t *buffer, int32_t *index) { uint32_t res = ((uint32_t) buffer[*index]) << 24 | ((uint32_t) buffer[*index + 1]) << 16 | ((uint32_t) buffer[*index + 2]) << 8 | ((uint32_t) buffer[*index + 3]); *index += 4; int e = (res >> 23) & 0xFF; uint32_t sig_i = res & 0x7FFFFF; bool neg = res & (1 << 31); float sig = 0.0; if (e != 0 || sig_i != 0) { sig = (float)sig_i / (8388608.0 * 2.0) + 0.5; e -= 126; } if (neg) { sig = -sig; } return ldexpf(sig, e); } 

frexp and ldexp seem to be for this purpose, but in case they are not available, I tried to implement them manually, as well as using common functions:

 float frexpf_slow(float f, int *e) { if (f == 0.0) { *e = 0; return 0.0; } *e = ceil(log2f(fabsf(f))); float res = f / powf(2.0, (float)*e); // Make sure that the magnitude stays below 1 so that no overflow occurs // during serialization. This seems to be required after doing some manual // testing. if (res >= 1.0) { res -= 0.5; *e += 1; } if (res <= -1.0) { res += 0.5; *e += 1; } return res; } 

and

 float ldexpf_slow(float f, int e) { return f * powf(2.0, (float)e); } 

One thing that I considered was to use 8388608 (2 ^ 23) or 8388607 (2 ^ 23 - 1) as a factor. The documentation says that frexp returns values ​​less than 1 in magnitude, and after some experimentation, it seems that 8388608 gives results that are accurate with the actual floats, and I could not find a single corner case where this happens. However, this may not be true with another compiler / system. If this can be a problem, then a smaller factor that reduces accuracy is also with me. I know that this does not handle Inf or NaN, but at the moment this is not a requirement.

So finally my question is: does this seem like a reasonable approach, or am I just making a complicated decision that still has portability problems?

+6
source share
2 answers

You seem to have a bug in serialize_float : the last 4 lines should read:

 buffer[(*index)++] = (res >> 24) & 0xFF; buffer[(*index)++] = (res >> 16) & 0xFF; buffer[(*index)++] = (res >> 8) & 0xFF; buffer[(*index)++] = res & 0xFF; 

Your method may not work correctly for infinities and / or NaN due to an offset of 126 instead of 128 . Please note that you can test it by thorough testing: a total of 4 billion values, all attempts should not last long.

The actual memory representation of float values ​​may vary for different architectures, but IEEE 854 (or, more precisely, IEC 60559) is currently widespread. You can check if your specific goals are relevant or not by checking if __STDC_IEC_559__ is __STDC_IEC_559__ . However, note that even if you can accept IEEE 854, you must handle potentially different orientations between systems. You cannot assume that the value of end float will be the same as for integers for the same platform.

Note also that the simple cast was incorrect: uint32_t res = *(uint32_t *)&number; violates the rule of strict aliasia. You must either use union or use memcpy(&res, &number, sizeof(res));

+4
source

Assuming the float is in IEEE 754 format, extracting the mantissa, exponent, and sign is fully portable:

 uint32_t internal; float value = //...some value memcpy( &internal , &value , sizeof( value ) ); 

 const uint32_t sign = ( internal >> 31u ) & 0x1u; const uint32_t mantissa = ( internal >> 0u ) & 0x7FFFFFu; const uint32_t exponent = ( internal >> 23u ) & 0xFFu; 

Invert the procedure to build a float.

If you want to send only the entire float, just copy it to the clipboard. This will work even if the float is not IEEE 754, but it should be 32 bits, and the final target of both integer and floating point types should be the same:

 buffer[0] = ( internal >> 0u ) & 0xFFu; buffer[1] = ( internal >> 8u ) & 0xFFu; buffer[2] = ( internal >> 16u ) & 0xFFu; buffer[3] = ( internal >> 24u ) & 0xFFu; 
+4
source

Source: https://habr.com/ru/post/1012013/


All Articles