Fast version: it depends on the hardware, the optimizer, if it divides by a constant or not (pdf), if there are exceptions to check (for example, modulo 0), if and how negative numbers are processed ( this is a terrible question for C ++ ) etc.
R gave a nice, concise answer for unsigned integers, but it's hard to understand if you are not good at C.
The essence of the method illuminated by R is to remove multiples of q until there are no more multiples of q . We could naively do this with a simple loop:
while (p >= q) p -= q;
The code may be short, but for large p and small q, this can take a very long time.
Better than deleting one q at a time would be to delete many q at a time. Note that we really want to drop as many q as possible, i.e. floor(p/q) many q ... And indeed, this is the correct method. For unsigned integers, one would expect p % q == p - (p / q) * q . (Note that an unsigned integer is rounded down.)
But it almost looks like a hoax, because the separation and residue operations are so closely related. (In fact, often if the hardware supports division, it supports the separation and calculation-remainder operation because they are so tightly coupled.)
Assuming we donβt have access to division, how can we find a multiple of q greater than 1 to cancel? In hardware, fixed shift operations are cheap (if not nearly free) and conceptually represent multiplication by the non-negative strength of the two. For example, moving a bit string left by 3 is equivalent to multiplying by 8 (i.e. 2 ^ 3), for example. 5 decimal places is equivalent to the binary '101'. Shift '101' in binary, adding three zeros to the right (giving "101000"), and the result is 50 in decimal β five times eight.
Similarly, shift operations are very cheap, because the operations are with software and you will try to find a controller that does not support them quickly. (Some architectures, such as ARM, may even combine shifts with other instructions to make them βfreeβ for a long time.)
ARMed (cannot resist) with these shift operations, we can act as follows:
- Find out the greatest power of two we can multiply
q by and even less p . - When working with the highest power from two to the smallest, multiply
q by each force of two, and if it is less than the remainder of p subtract it from what remains of p . - All that remains is the remainder.
Why does it work? Because in the end, you will find that all the deductible powers of the two actually add up to floor(p / q) ! Do not take my word for it, similar knowledge has been known for a very long time .
R-response gap:
#define HI (-1U-(-1U/2))
This gives you an unsigned integer with only the highest value set.
unsigned i; for (i=0; !(HI & (q<<i)); i++);
This line actually believes that the maximum power of two q can be multiplied to overflow an unsigned integer. This is not absolutely necessary, but it does not change the results, except to increase the required execution time.
If you are not familiar with C-isms on this line:
(q<<i) - shift of the left bit by i . Recall that this is equivalent to multiplying by 2 ^ i .HI & (q<<i) performs bitwise AND. Since HI has only its upper bit, this will only lead to a nonzero value if (q<<i) is large enough so that the upper bit is nonzero. Another shift to the left and there will be an overflow of integers.!(HI & (q<<i)) is βtrueβ when (HI & (q<<i)) is zero and false otherwise.
do { if (p >= (q<<i)) p -= (q<<i); } while (i--);
This is a simple shortening loop do { .... } while (i--); . Note that post-decrementing is used on i , so the loop is executed, then it checks if i is not zero, then subtracts one from i , and then, if its early check led to true , it continues. This property has the property that the loop last executes when i is 0. This is important because we might need to delete an unnecessary copy of q .
if (p >= (q<<i)) checks if 2 ^ i * q p less. If so, p -= (q<<i) removes it.
The remainder remains.