It will really depend on the processor, and an integer range that is better (and using double
solve most range problems)
For modern "large" processors, such as x86-64 and ARM, integer division and floating point division are approximately the same effort, and converting an integer to a float or vice versa is not a "difficult" task (and does the correct rounding directly in this conversion, at least), therefore, most likely, the resulting operations.
atmp = (float) a; btmp = (float) b; resfloat = divide atmp/btmp; return = to_int_with_rounding(resfloat)
About four machine instructions.
On the other hand, your code uses two divisions, one modulo and multiplication, which is pretty likely on such a processor.
tmp = a/b; tmp1 = a % b; tmp2 = tmp1 * 2; tmp3 = tmp2 / b; tmp4 = tmp + tmp3;
So, there are five instructions, and three of them are “divided” (if only the compiler is smart enough to reuse a / b
for a % b
- but it still has two separate divisions).
Of course, if you are outside the range of numbers that a float or double can contain without losing a digit (23 bits for float, 53 bits for double), then your method MAY be better (if there is no overflow in integer math).
Besides all this, since the first form is used by "everyone", it is one that the compiler recognizes and can optimize.
Obviously, the results depend on both the compiler used and the processor on which it works, but these are my results when running the code that came out above compiled through clang++
(v3.9-pre-release, pretty close to 3.8 released).
round_divide_by_float_casting(): 32.5 ns round_divide_by_modulo(): 113 ns divide_by_quotient_comparison(): 80.4 ns
However, the interesting thing I find when I look at the generated code is:
xorps %xmm0, %xmm0 cvtsi2ssl 8016(%rsp,%rbp), %xmm0 xorps %xmm1, %xmm1 cvtsi2ssl 4016(%rsp,%rbp), %xmm1 divss %xmm1, %xmm0 callq roundf cvttss2si %xmm0, %eax movl %eax, 16(%rsp,%rbp) addq $4, %rbp cmpq $4000, %rbp
is that round
is actually a challenge. Which really surprises me, but explains why on some machines (especially later x86 processors) this happens faster.
g++
gives better results with -ffast-math
, which gives about:
round_divide_by_float_casting(): 17.6 ns round_divide_by_modulo(): 43.1 ns divide_by_quotient_comparison(): 18.5 ns
(with an increased counter up to 100 thousand values)