How can I optimize C / x86 code?

Question

How can I optimize C / x86 code?

int lcm_old(int a, int b) { int n; for(n=1;;n++) if(n%a == 0 && n%b == 0) return n; } int lcm(int a,int b) { int n = 0; __asm { lstart: inc n; mov eax, n; mov edx, 0; idiv a; mov eax, 0; cmp eax, edx; jne lstart; mov eax, n; mov edx, 0; idiv b; mov eax, 0; cmp eax, edx; jnz lstart; } return n; }

I am trying to beat / match the code for the top function using my own function (bottom). Do you have any ideas how I can optimize my routine?

PS. It is just for fun.

+4

optimization assembly x86 greatest-common-divisor lcm

user181351 Feb 08 '10 at 16:09

source share

2 answers

I would optimize using a different algorithm. Searching linearly, as you do, is very slow. It is a fact that the least common mulitple of two natural numbers is the quotient of their product divided by their largest common divisor. You can quickly calculate the largest common factor using the Euclidean algorithm .

Thus:

 int lcm(int a, int b) { int p = a * b; return p / gcd(a, b); }

where you need to implement gcd(int, int) . Since the average number of steps in the Euclidean algorithm is O(log n) , we defeat the naive linear search for hands down.

There are other approaches to this problem. If you had an algorithm that could quickly determine integers (say, a quantum computer ), then you can also solve this problem. If you write each of a and b in your canonical factorization

 a = p_a0^e_a0 * p_a1^e_a1 * ... * p_am^e_am b = p_b0^e_b0 * p_b1^e_b1 * ... * p_bn^e_bn

then the least common multiple of a and b is obtained by taking for each prime factor that appears in at least one of the factorizations a and b , taking it with the maximum exponent that it appears when factorizing a or b . For instance:

 28 = 2^2 * 7 312 = 2^3 * 39

so that

 lcm(28, 312) = 2^3 * 7 * 39 = 2184

All this indicates that naive approaches deserve admiration for their simplicity, but you can spend all day optimizing every last nanosecond of them and still not surpass the excellent algorithm.

+14

jason Feb 08 '10 at 16:13

source share

Jerry Coffin · Accepted Answer · 2010-02-08T16:30:43+0000

I am going to assume that you want to keep the same algorithm. This should be at least a slightly more efficient implementation. The main difference is that the code in the loop uses registers, not memory.

 int lcm(int a,int b) { __asm { xor ecx, ecx mov esi, a mov edi, b lstart: inc ecx mov eax, ecx xor edx, edx idiv esi test edx, edx jne lstart mov eax, ecx; idiv edi test edx, edx jnz lstart mov eax, ecx leave ret } }

However, as Jason noted, this is really not a very efficient algorithm - multiplication, GCD search, and division will usually be faster (if a and b quite small).

Edit: there is another algorithm that is almost easier to understand, which should also be much faster (than the original, not multiplication, and then division by GCD). Instead of generating consecutive numbers until you find one that divides both a and b , generate consecutive multiple units (preferably more) until you find one that evenly divides into another:

 int lcm2(int a, int b) { __asm { xor ecx, ecx mov esi, a mov edi, b lstart: add ecx, esi mov eax, ecx xor edx, edx idiv edi test edx, edx jnz lstart mov eax, ecx leave ret } }

It remains dead simply to understand, but should significantly improve the original.

How can I optimize C / x86 code?

More articles: