Built-in mod ('%') versus user modem function: improves module performance

I recently found out that the mod ('%') operator is very slow. So I created a function that will work like% b. But is this a faster mod operator?

Here is my function

int mod(int a, int b) { int tmp = a/b; return a - (b*tmp); } 
+6
c ++ modular-arithmetic
Oct 25 '15 at 18:27
source share
4 answers

According to Chandler Carruth tests on CppCon 2015 , the fastest modulo-operator (on x86 when compiling with Clang):

 int fast_mod(const int input, const int ceil) { // apply the modulo operator only when needed // (ie when the input is greater than the ceiling) return input >= ceil ? input % ceil : input; // NB: the assumption here is that the numbers are positive } 

I suggest you look at the whole conversation, it will examine in detail why this method works faster than just using % unconditionally.

+12
Oct 25 '15 at 18:52
source share

This probably depends on the compiler and platform.

But I was curious, and on my system you look right in my tests. However, the method from @ 865719 answer is the fastest:

 #include <chrono> #include <iostream> class Timer { using clk = std::chrono::steady_clock; using microseconds = std::chrono::microseconds; clk::time_point tsb; clk::time_point tse; public: void clear() { tsb = tse = clk::now(); } void start() { tsb = clk::now(); } void stop() { tse = clk::now(); } friend std::ostream& operator<<(std::ostream& o, const Timer& timer) { return o << timer.secs(); } // return time difference in seconds double secs() const { if(tse <= tsb) return 0.0; auto d = std::chrono::duration_cast<microseconds>(tse - tsb); return d.count() / 1000000.0; } }; int mod(int a, int b) { int tmp=a/b; return a-(b*tmp); } int fast_mod(const int input, const int ceil) { // apply the modulo operator only when needed // (ie when the input is greater than the ceiling) return input > ceil ? input % ceil : input; // NB: the assumption here is that the numbers are positive } int main() { auto N = 1000000000U; unsigned sum = 0; Timer timer; for(auto times = 0U; times < 3; ++times) { std::cout << " run: " << (times + 1) << '\n'; sum = 0; timer.start(); for(decltype(N) n = 0; n < N; ++n) sum += n % (N - n); timer.stop(); std::cout << " %: " << sum << " " << timer << "s" << '\n'; sum = 0; timer.start(); for(decltype(N) n = 0; n < N; ++n) sum += mod(n, N - n); timer.stop(); std::cout << " mod: " << sum << " " << timer << "s" << '\n'; sum = 0; timer.start(); for(decltype(N) n = 0; n < N; ++n) sum += fast_mod(n, N - n); timer.stop(); std::cout << "fast_mod: " << sum << " " << timer << "s" << '\n'; } } 

Assembly: GCC 5.1.1 (x86_64)

 g++ -std=c++14 -march=native -O3 -g0 ... 

Output:

  run: 1 %: 3081207628 5.49396s mod: 3081207628 4.30814s fast_mod: 3581207628 2.51296s run: 2 %: 3081207628 5.5522s mod: 3081207628 4.25427s fast_mod: 3581207628 2.52364s run: 3 %: 3081207628 5.4947s mod: 3081207628 4.29646s fast_mod: 3581207628 2.56916s 
+7
Oct 25 '15 at 7:14
source share

In most cases, your micro optimized code will not beat the compiler. I also do not know where this "wisdom" comes from, which claims that the built-in % should be slow. It is as fast as a machine can calculate it - with all the micro optimizations that the compiler can do for you.

Also note that measuring the performance of such very small pieces of code is not an easy task. Conditional loop constructs or the jitter of your time dimension can dominate your results. You can find some conversations on such issues, for example, by people such as, for example, Andrei Aleksantresku, or Chandler Karut on youtube. I once wrote micro-benchmarking for a project that I was working on. There are a lot of worries, including external things, such as the OS, crowding out your thread, or moving it to another kernel.

+1
Oct 25 '15 at 18:48
source share

Often a programmer can beat the performance of a remainder operation when the programmer knows about operands that are not in the compiler. For example, if the base is likely to have a capacity of 2, but is unlikely to be more than the cost to be reduced, you could use something like:

 unsigned mod(unsigned int x, unsigned int y) { return y & (y-1) ? x % y : x & (y-1); } 

If the compiler extends a function in a string, and the base is a power constant of 2, the compiler will replace the remainder operator with the striking AND, which can be a significant improvement. In cases where the base is not constant power in two, the generated code will have to do some calculations before choosing whether to use the remainder operator, but in cases where the base turns out to be two, the cost-saving bitwise AND may exceed the cost of conditional logic.

Another scenario in which a user-defined function of a module can help is when the base is a fixed constant, for which the compiler did not provide for calculating the remainder. For example, if you want to calculate x% 65521 on a platform that can perform fast integer shifts and multiplications, you may notice that the calculation x -= (x>>16)*65521; will cause x be much smaller, but will not affect the value of x % 65521 . Performing the operation a second time will reduce x to the range 0..65745 - small enough for one conditional subtraction to give the correct remainder.

Some compilers may know how to use such methods to effectively control the % operator with a constant base, but for those that are not suitable, the approach can be a useful optimization, especially when working with numbers larger than the machine word [note that 65521 is 65536-15, so on a 16-bit machine, you can evaluate x as x = (x & 65536) + 15*(x >> 16) . Not as readable as a form that subtracts 65521 * (x >> 16) , but it’s easy to understand how efficiently it can be processed on a 16-bit machine.

0
May 2 '17 at 23:23
source share



All Articles