Rounding integers

Question

Rounding integers

There is something that prevents me from integer arithmetic in textbooks. More precisely, integer division.

Apparently the preferred method is that it divides the divisor by float, and then rounds the float to the nearest integer, and then returns it to an integer:

#include <cmath> int round_divide_by_float_casting(int a, int b){ return (int) std::roundf( a / (float) b); }

But it looks like a scratch on the left ear with the right hand. I use:

 int round_divide (int a, int b){ return a / b + a % b * 2 / b; }

This is not a breakthrough, but the fact that it is not standard makes me wonder if I have lost anything. Despite my (albeit limited) testing, I could not find a single scenario where the two methods give me different results. Someone came across some kind of scenario where int-> float-> int casting produced more accurate results?

+5

c ++ math integer rounding

Adl a Mar 01 '16 at 7:56

source share

4 answers

Ysc · Answer 1 · 2016-03-01T08:26:16+0000

Arithmetic solution

If someone determined that your functions should return, she would describe it as something close, since " f(a, b) returns the closest integer of the result of dividing a by b in the real ring of divisors."

Thus, the question can be summarized as follows: can we determine this nearest integer using only integer division. I think we can.

There are exactly two candidates as the nearest integer: a / b and (a / b) + 1 (1). The choice is easy, if a % b closer to 0 , as is b , then a / b is our result. If not, (a / b) + 1 is.

You could write something similar, ignoring optimization and best practices:

 int divide(int a, int b) { const int quot = a / b; const int rem = a % b; int result; if (rem < b - rem) { result = quot; } else { result = quot + 1; } return result; }

Although this definition meets the needs, one could optimize it without calculating twice the division of a by b using std::div() :

 int divide(int a, int b) { const std::div_t dv = std::div(a, b); int result = dv.quot; if (dv.rem >= b - dv.rem) { ++result; } return result; }

An analysis of the previously conducted problem guarantees us a clear behavior of our implementation.

(1) There is only one last thing to check: how does it behave when a or b negative? This remains to the reader;).

Benchmark

 #include <iostream> #include <iomanip> #include <string> // solutions #include <cmath> #include <cstdlib> // benchmak #include <limits> #include <random> #include <chrono> #include <algorithm> #include <functional> // // Solutions // namespace { int round_divide_by_float_casting(int a, int b) { return (int)roundf(a / (float)b); } int round_divide_by_modulo(int a, int b) { return a / b + a % b * 2 / b; } int divide_by_quotient_comparison(int a, int b) { const std::div_t dv = std::div(a, b); int result = dv.quot; if (dv.rem >= b - dv.rem) { ++result; } return result; } } // // benchmark // class Randomizer { std::mt19937 _rng_engine; std::uniform_int_distribution<int> _distri; public: Randomizer() : _rng_engine(std::time(0)), _distri(std::numeric_limits<int>::min(), std::numeric_limits<int>::max()) { } template<class ForwardIt> void operator()(ForwardIt begin, ForwardIt end) { std::generate(begin, end, std::bind(_distri, _rng_engine)); } }; class Clock { std::chrono::time_point<std::chrono::steady_clock> _start; public: static inline std::chrono::time_point<std::chrono::steady_clock> now() { return std::chrono::steady_clock::now(); } Clock() : _start(now()) { } template<class DurationUnit> std::size_t end() { return std::chrono::duration_cast<DurationUnit>(now() - _start).count(); } }; // // Entry point // int main() { Randomizer randomizer; std::array<int, 1000> dividends; // SCALE THIS UP (1'000'000 would be great) std::array<int, dividends.size()> divisors; std::array<int, dividends.size()> results; randomizer(std::begin(dividends), std::end(dividends)); randomizer(std::begin(divisors), std::end(divisors)); { Clock clock; auto dividend = std::begin(dividends); auto divisor = std::begin(divisors); auto result = std::begin(results); for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result) { *result = round_divide_by_float_casting(*dividend, *divisor); } const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size()); std::cout << std::setw(40) << "round_divide_by_float_casting(): " << std::setprecision(3) << unit_time << " ns\n"; } { Clock clock; auto dividend = std::begin(dividends); auto divisor = std::begin(divisors); auto result = std::begin(results); for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result) { *result = round_divide_by_modulo(*dividend, *divisor); } const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size()); std::cout << std::setw(40) << "round_divide_by_modulo(): " << std::setprecision(3) << unit_time << " ns\n"; } { Clock clock; auto dividend = std::begin(dividends); auto divisor = std::begin(divisors); auto result = std::begin(results); for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result) { *result = divide_by_quotient_comparison(*dividend, *divisor); } const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size()); std::cout << std::setw(40) << "divide_by_quotient_comparison(): " << std::setprecision(3) << unit_time << " ns\n"; } }

results

 g++ -std=c++11 -O2 -Wall -Wextra -Werror main.cpp && ./a.out round_divide_by_float_casting(): 54.7 ns round_divide_by_modulo(): 24 ns divide_by_quotient_comparison(): 25.5 ns

Demo

The performance of the two arithmetic solutions does not differ (their benchmark converges when scaling the size of the scanner).

Mats petersson · Answer 2 · 2016-03-01T08:22:53+0000

It will really depend on the processor, and an integer range that is better (and using double solve most range problems)

For modern "large" processors, such as x86-64 and ARM, integer division and floating point division are approximately the same effort, and converting an integer to a float or vice versa is not a "difficult" task (and does the correct rounding directly in this conversion, at least), therefore, most likely, the resulting operations.

 atmp = (float) a; btmp = (float) b; resfloat = divide atmp/btmp; return = to_int_with_rounding(resfloat)

About four machine instructions.

On the other hand, your code uses two divisions, one modulo and multiplication, which is pretty likely on such a processor.

 tmp = a/b; tmp1 = a % b; tmp2 = tmp1 * 2; tmp3 = tmp2 / b; tmp4 = tmp + tmp3;

So, there are five instructions, and three of them are “divided” (if only the compiler is smart enough to reuse a / b for a % b - but it still has two separate divisions).

Of course, if you are outside the range of numbers that a float or double can contain without losing a digit (23 bits for float, 53 bits for double), then your method MAY be better (if there is no overflow in integer math).

Besides all this, since the first form is used by "everyone", it is one that the compiler recognizes and can optimize.

Obviously, the results depend on both the compiler used and the processor on which it works, but these are my results when running the code that came out above compiled through clang++ (v3.9-pre-release, pretty close to 3.8 released).

  round_divide_by_float_casting(): 32.5 ns round_divide_by_modulo(): 113 ns divide_by_quotient_comparison(): 80.4 ns

However, the interesting thing I find when I look at the generated code is:

 xorps %xmm0, %xmm0 cvtsi2ssl 8016(%rsp,%rbp), %xmm0 xorps %xmm1, %xmm1 cvtsi2ssl 4016(%rsp,%rbp), %xmm1 divss %xmm1, %xmm0 callq roundf cvttss2si %xmm0, %eax movl %eax, 16(%rsp,%rbp) addq $4, %rbp cmpq $4000, %rbp # imm = 0xFA0 jne .LBB0_7

is that round is actually a challenge. Which really surprises me, but explains why on some machines (especially later x86 processors) this happens faster.

g++ gives better results with -ffast-math , which gives about:

  round_divide_by_float_casting(): 17.6 ns round_divide_by_modulo(): 43.1 ns divide_by_quotient_comparison(): 18.5 ns

(with an increased counter up to 100 thousand values)

teroi · Answer 3 · 2016-03-01T08:02:53+0000

Prefer a standard solution. Use the family of std :: div functions declared in cstdlib.

See: http://en.cppreference.com/w/cpp/numeric/math/div

edit: casting to float, and then int can be very inefficient on some architectures, ex microcontrollers.

Adl a · Answer 4 · 2016-03-01T09:08:02+0000

Thanks for the suggestions so far. to shed some light, I made a test setup to compare performance.

 #include <iostream> #include <string> #include <cmath> #include <cstdlib> #include <chrono> using namespace std; int round_divide_by_float_casting(int a, int b) { return (int)roundf(a / (float)b); } int round_divide_by_modulo(int a, int b) { return a / b + a % b * 2 / b; } int divide_by_quotient_comparison(int a, int b) { const std::div_t dv = std::div(a, b); int result = dv.quot; if (dv.rem <= b - dv.rem) { ++result; } return result; } int main() { int itr = 1000; //while (true) { auto begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 10; j < itr + 1; j++) { divide_by_quotient_comparison(i, j); } } auto end = std::chrono::steady_clock::now(); cout << "divide_by_quotient_comparison(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 10; j < itr + 1; j++) { round_divide_by_float_casting(i, j); } } end = std::chrono::steady_clock::now(); cout << "round_divide_by_float_casting(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 10; j < itr + 1; j++) { round_divide_by_modulo(i, j); } } end = std::chrono::steady_clock::now(); cout << "round_divide_by_modulo(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; //} return 0; }

The results obtained on my machine (i7 with vs2015) were as follows: the arithmetic modulo was about two times faster than the casting method int-> float-> int . a method based on std :: div_t (suggested by @YSC and @teroi) was faster than int-> float-> int , but slower than the modulo arithmetic method.

EDIT A second test was performed to avoid certain compiler optimizations noted by @YSC: #include #include #include #include #include #include using the namespace std;

 int round_divide_by_float_casting(int a, int b) { return (int)roundf(a / (float)b); } int round_divide_by_modulo(int a, int b) { return a / b + a % b * 2 / b; } int divide_by_quotient_comparison(int a, int b) { const std::div_t dv = std::div(a, b); int result = dv.quot; if (dv.rem <= b - dv.rem) { ++result; } return result; } int main() { int itr = 100; vector <int> randi, randj; for (int i = 0; i < itr; i++) { randi.push_back(rand()); int rj = rand(); if (rj == 0) rj++; randj.push_back(rj); } vector<int> f, m, q; while (true) { auto begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 0; j < itr; j++) { q.push_back( divide_by_quotient_comparison(randi[i] , randj[j]) ); } } auto end = std::chrono::steady_clock::now(); cout << "divide_by_quotient_comparison(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 0; j < itr; j++) { f.push_back( round_divide_by_float_casting(randi[i], randj[j]) ); } } end = std::chrono::steady_clock::now(); cout << "round_divide_by_float_casting(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 0; j < itr; j++) { m.push_back( round_divide_by_modulo(randi[i], randj[j]) ); } } end = std::chrono::steady_clock::now(); cout << "round_divide_by_modulo(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; cout << endl; f.clear(); m.clear(); q.clear(); } return 0; }

In this second test, the slowest was divide_by_quotient() , depending on std :: div_t , then divide_by_float() , and the fastest was divide_by_modulo() . however, this time the difference in performance was much lower than 20%.

Rounding integers

Arithmetic solution

Benchmark

More articles: