The module has already been explained, however, we repeat.
To find the remainder k modulo 2^n-1 , write
k = a + 2^n*b, 0 <= a < 2^n
Then
k = a + ((2^n-1) + 1) * b = (a + b) + (2^n-1)*b β‘ (a + b) (mod 2^n-1)
If a + b >= 2^n , repeat until the remainder is less than 2^n , and if this leads you to a + b = 2^n-1 , replace this with 0. Each "right shift by n and add to the last n bit "moves the first bit of the set to the right by n or n-1 places (if k < 2^(2*n-1) , when the first bit of the set after changing and adding can be bit 2^n ). Therefore, if the width of the type is large compared to n , this will require many shifts - consider a 128-bit type and n = 3 , for large k you will need more than 40 shifts. To reduce the number of shifts needed, you can use the fact that
2^(m*n) - 1 = (2^n - 1) * (2^((m-1)*n) + 2^((m-2)*n) + ... + 2^(2*n) + 2^n + 1),
we will only use the fact that 2^n - 1 divides 2^(m*n) - 1 for all m > 0 . Then you shift the multiples of n , which are about half the maximum bit length that the value in this step can make. For the above example of 128-bit type and residual modulo 7 ( 2^3 - 1 ), the closest multiples from 3 to 128/2 are 63 and 66, the first shift is 63 bits
r_1 = (k & (2^63 - 1)) + (k >> 63) // r_1 < 2^63 + 2^(128-63) < 2^66
to get a number with no more than 66 bits, then shift by 66/2 = 33 bits
r_2 = (r_1 & (2^33 - 1)) + (r_1 >> 33) // r_2 < 2^33 + 2^(66-33) = 2^34
to reach no more than 34 bits. The next shift is 18 bits, then 9, 6, 3
r_3 = (r_2 & (2^18 - 1)) + (r_2 >> 18) // r_3 < 2^18 + 2^(34-18) < 2^19 r_4 = (r_3 & (2^9 - 1)) + (r_3 >> 9) // r_4 < 2^9 + 2^(19-9) < 2^11 r_5 = (r_4 & (2^6 - 1)) + (r_4 >> 6) // r_5 < 2^6 + 2^(11-6) < 2^7 r_6 = (r_5 & (2^3 - 1)) + (r_5 >> 3) // r_6 < 2^3 + 2^(7-3) < 2^5 r_7 = (r_6 & (2^3 - 1)) + (r_6 >> 3) // r_7 < 2^3 + 2^(5-3) < 2^4
Now there is one subtraction if r_7 >= 2^3 - 1 . To calculate k % (2^n -1) in a b-bit type, O (log 2 (b / n)) is needed.
The result is similar, write again
k = a + 2^n*b, 0 <= a < 2^n = a + ((2^n-1) + 1)*b = (2^n-1)*b + (a+b),
so k/(2^n-1) = b + (a+b)/(2^n-1) , and we continue for now a+b > 2^n-1 . Here, unfortunately, we cannot reduce the work by shifting and masking about half the width, so the method is effective only when n not much less than the type width.
Code for quick cases when n not too small:
unsigned long long modulus_2n1(unsigned n, unsigned long long k) { unsigned long long mask = (1ULL << n) - 1ULL; while(k > mask) { k = (k & mask) + (k >> n); } return k == mask ? 0 : k; } unsigned long long quotient_2n1(unsigned n, unsigned long long k) { unsigned long long mask = (1ULL << n) - 1ULL, quotient = 0; while(k > mask) { quotient += k >> n; k = (k & mask) + (k >> n); } return k == mask ? quotient + 1 : quotient; }
In the particular case when n is half the width of the type, the cycle is performed no more than two times, therefore, if the road branches, it may be better to expand the cycle and unconditionally execute the body of the cycle twice.