How C performs transaction%

I am curious to understand the logic of working with the module, as I understand that bit-shift operations can be performed to perform various actions, such as shifting bits for multiplication.

One way to see how this is done is with a recursive algorithm that continues to share until you can no longer share, but that doesn't seem to be effective.

Any ideas would be helpful. Thanks in advance!

+6
source share
4 answers

Fast version: it depends on the hardware, the optimizer, if it divides by a constant or not (pdf), if there are exceptions to check (for example, modulo 0), if and how negative numbers are processed ( this is a terrible question for C ++ ) etc.

R gave a nice, concise answer for unsigned integers, but it's hard to understand if you are not good at C.

The essence of the method illuminated by R is to remove multiples of q until there are no more multiples of q . We could naively do this with a simple loop:

 while (p >= q) p -= q; // One liner, woohoo! 

The code may be short, but for large p and small q, this can take a very long time.

Better than deleting one q at a time would be to delete many q at a time. Note that we really want to drop as many q as possible, i.e. floor(p/q) many q ... And indeed, this is the correct method. For unsigned integers, one would expect p % q == p - (p / q) * q . (Note that an unsigned integer is rounded down.)

But it almost looks like a hoax, because the separation and residue operations are so closely related. (In fact, often if the hardware supports division, it supports the separation and calculation-remainder operation because they are so tightly coupled.)

Assuming we don’t have access to division, how can we find a multiple of q greater than 1 to cancel? In hardware, fixed shift operations are cheap (if not nearly free) and conceptually represent multiplication by the non-negative strength of the two. For example, moving a bit string left by 3 is equivalent to multiplying by 8 (i.e. 2 ^ 3), for example. 5 decimal places is equivalent to the binary '101'. Shift '101' in binary, adding three zeros to the right (giving "101000"), and the result is 50 in decimal β€” five times eight.

Similarly, shift operations are very cheap, because the operations are with software and you will try to find a controller that does not support them quickly. (Some architectures, such as ARM, may even combine shifts with other instructions to make them β€œfree” for a long time.)

ARMed (cannot resist) with these shift operations, we can act as follows:

  • Find out the greatest power of two we can multiply q by and even less p .
  • When working with the highest power from two to the smallest, multiply q by each force of two, and if it is less than the remainder of p subtract it from what remains of p .
  • All that remains is the remainder.

Why does it work? Because in the end, you will find that all the deductible powers of the two actually add up to floor(p / q) ! Do not take my word for it, similar knowledge has been known for a very long time .

R-response gap:

 #define HI (-1U-(-1U/2)) 

This gives you an unsigned integer with only the highest value set.

 unsigned i; for (i=0; !(HI & (q<<i)); i++); 

This line actually believes that the maximum power of two q can be multiplied to overflow an unsigned integer. This is not absolutely necessary, but it does not change the results, except to increase the required execution time.

If you are not familiar with C-isms on this line:

  • (q<<i) - shift of the left bit by i . Recall that this is equivalent to multiplying by 2 ^ i .
  • HI & (q<<i) performs bitwise AND. Since HI has only its upper bit, this will only lead to a nonzero value if (q<<i) is large enough so that the upper bit is nonzero. Another shift to the left and there will be an overflow of integers.
  • !(HI & (q<<i)) is β€œtrue” when (HI & (q<<i)) is zero and false otherwise.

do { if (p >= (q<<i)) p -= (q<<i); } while (i--);

This is a simple shortening loop do { .... } while (i--); . Note that post-decrementing is used on i , so the loop is executed, then it checks if i is not zero, then subtracts one from i , and then, if its early check led to true , it continues. This property has the property that the loop last executes when i is 0. This is important because we might need to delete an unnecessary copy of q .

if (p >= (q<<i)) checks if 2 ^ i * q p less. If so, p -= (q<<i) removes it.

The remainder remains.

+14
source

While most C implementations run on hardware with a division instruction, the stop operation can be performed in much the same way as when calculating p%q , assuming unsigned values:

 #define HI (-1U-(-1U/2)) unsigned i; for (i=0; !(HI & (q<<i)); i++); do { if (p >= (q<<i)) p -= (q<<i); } while (i--); 

The resulting residue is in p .

+5
source

In addition to hardware instruction and implementation using shifts, as R .. suggests , there is also mutual multiplication .

This method can be used when the right side of % is a constant known at compile time.
Mutual multiplication is used to implement division, but using it for % easy, based on the formula a%b == a-(a/b)*b .

+2
source

Depending on the skills of the optimizer, there is a shortcut for the base with module 2. For example, a % 32 can be implemented as a & 31 . In the general case, a % (2^N) == a & (2^N -1) . This is lightning fast compared to fission. Most divisors (ever hardware) require at least 1 cycle for each bit of the result to compute, while AND logic is just a few loop operations (in the pipeline).

EDIT: this only works if a is unsigned!

0
source

Source: https://habr.com/ru/post/948161/


All Articles