I struggled with finding a portable way to serialize 32-bit floating point variables in C and C ++ to send to and from microcontrollers. I want the format well defined so that serialization / de-serialization can also be done from other languages without much effort. Related questions:
Double / float binary serialization portability in C ++
Serialize double and float with C
C ++ portable long to double conversion
I know that in most cases typecast union / memcpy will work fine because the float view is the same, but I would rather have a little more control and skill. So far I have come to the following:
void serialize_float32(uint8_t* buffer, float number, int32_t *index) { int e = 0; float sig = frexpf(number, &e); float sig_abs = fabsf(sig); uint32_t sig_i = 0; if (sig_abs >= 0.5) { sig_i = (uint32_t)((sig_abs - 0.5f) * 2.0f * 8388608.0f); e += 126; } uint32_t res = ((e & 0xFF) << 23) | (sig_i & 0x7FFFFF); if (sig < 0) { res |= 1 << 31; } buffer[(*index)++] = (res >> 24) & 0xFF; buffer[(*index)++] = (res >> 16) & 0xFF; buffer[(*index)++] = (res >> 8) & 0xFF; buffer[(*index)++] = res & 0xFF; }
and
float deserialize_float32(const uint8_t *buffer, int32_t *index) { uint32_t res = ((uint32_t) buffer[*index]) << 24 | ((uint32_t) buffer[*index + 1]) << 16 | ((uint32_t) buffer[*index + 2]) << 8 | ((uint32_t) buffer[*index + 3]); *index += 4; int e = (res >> 23) & 0xFF; uint32_t sig_i = res & 0x7FFFFF; bool neg = res & (1 << 31); float sig = 0.0; if (e != 0 || sig_i != 0) { sig = (float)sig_i / (8388608.0 * 2.0) + 0.5; e -= 126; } if (neg) { sig = -sig; } return ldexpf(sig, e); }
frexp and ldexp seem to be for this purpose, but in case they are not available, I tried to implement them manually, as well as using common functions:
float frexpf_slow(float f, int *e) { if (f == 0.0) { *e = 0; return 0.0; } *e = ceil(log2f(fabsf(f))); float res = f / powf(2.0, (float)*e);
and
float ldexpf_slow(float f, int e) { return f * powf(2.0, (float)e); }
One thing that I considered was to use 8388608 (2 ^ 23) or 8388607 (2 ^ 23 - 1) as a factor. The documentation says that frexp returns values less than 1 in magnitude, and after some experimentation, it seems that 8388608 gives results that are accurate with the actual floats, and I could not find a single corner case where this happens. However, this may not be true with another compiler / system. If this can be a problem, then a smaller factor that reduces accuracy is also with me. I know that this does not handle Inf or NaN, but at the moment this is not a requirement.
So finally my question is: does this seem like a reasonable approach, or am I just making a complicated decision that still has portability problems?