Compare a 32-bit float and a 32-bit integer without casting twice when any value can be too large to exactly match another type

I have a 32-bit floating point number f(known as positive) that I need to convert to a 32-bit unsigned integer. This value may be too large to fit. In addition, there are downstream calculations that require some margin of safety. I can calculate the maximum allowed value mas a 32 bit integer. How to effectively determine in C ++ 11 on a 32-bit machine with restrictions (ARM M4F), if f <= mmathematically. Note that the types of the two values ​​do not match. Each of the three approaches has its own problems:

  • static_cast<uint32_t>(f) <= m: I think this causes undefined behavior if it fdoes not match a 32-bit integer
  • f <= static_cast<float>(m): if it is mtoo large to convert accurately, the converted value may be larger than m, so the subsequent comparison will lead to an incorrect result in some cases of the edge
  • static_cast<double>(f) <= static_cast<double>(m): is mathematically correct, but requires dropping and working with doubles, which I would like to avoid for reasons of efficiency.

Of course, there must be a way to convert an integer to a float directly with the specified rounding direction, that is, ensuring that the result does not exceed the input signal in magnitude. I would prefer a standard C ++ 11 solution, but in the worst case, I could also claim a platform.

+6
1

, - . 2³² ​​ . , f , , unsigned m.

const float unsigned_limit = 4294967296.0f;
bool ok = false;
if (f < unsigned_limit)
{
    const auto uf = static_cast<unsigned int>(f);
    if (uf <= m)
    {
        ok = true;
    }
}

, .

f m ( ), float(m)*0.99f ( float(m)*1.01f), . , , , , .

+4

Source: https://habr.com/ru/post/1017092/


All Articles