Python code optimization (20 times slower than C)

Question

Python code optimization (20 times slower than C)

I wrote this very poorly optimized C code that does a simple math calculation:

#include <stdio.h> #include <math.h> #include <stdlib.h> #define MIN(a, b) (((a) < (b)) ? (a) : (b)) #define MAX(a, b) (((a) > (b)) ? (a) : (b)) unsigned long long int p(int); float fullCheck(int); int main(int argc, char **argv){ int i, g, maxNumber; unsigned long long int diff = 1000; if(argc < 2){ fprintf(stderr, "Usage: %s maxNumber\n", argv[0]); return 0; } maxNumber = atoi(argv[1]); for(i = 1; i < maxNumber; i++){ for(g = 1; g < maxNumber; g++){ if(i == g) continue; if(p(MAX(i,g)) - p(MIN(i,g)) < diff && fullCheck(p(MAX(i,g)) - p(MIN(i,g))) && fullCheck(p(i) + p(g))){ diff = p(MAX(i,g)) - p(MIN(i,g)); printf("We have a couple %llu %llu with diff %llu\n", p(i), p(g), diff); } } } return 0; } float fullCheck(int number){ float check = (-1 + sqrt(1 + 24 * number))/-6; float check2 = (-1 - sqrt(1 + 24 * number))/-6; if(check/1.00 == (int)check) return check; if(check2/1.00 == (int)check2) return check2; return 0; } unsigned long long int p(int n){ return n * (3 * n - 1 ) / 2; }

And then I tried (just for fun) to pass it under Python to see how it would react. My first version was an almost 1: 1 conversion that worked horribly slow (120 + secs in Python vs <1sec in C). I optimized a bit, and here is what I got:

 #!/usr/bin/env/python from cmath import sqrt import cProfile from pstats import Stats def quickCheck(n): partial_c = (sqrt(1 + 24 * (n)))/-6 c = 1/6 + partial_c if int(c.real) == c.real: return True c = c - 2*partial_c if int(c.real) == c.real: return True return False def main(): maxNumber = 5000 diff = 1000 for i in range(1, maxNumber): p_i = i * (3 * i - 1 ) / 2 for g in range(i, maxNumber): if i == g: continue p_g = g * (3 * g - 1 ) / 2 if p_i > p_g: ma = p_i mi = p_g else: ma = p_g mi = p_i if ma - mi < diff and quickCheck(ma - mi): if quickCheck(ma + mi): print ('New couple ', ma, mi) diff = ma - mi cProfile.run('main()','script_perf') perf = Stats('script_perf').sort_stats('time', 'calls').print_stats(10)

It works in about 16 seconds, which is better, but also almost 20 times slower than C. Now I know that C is better than Python for this kind of calculation, but what I would like to know is what I missed (Python-wise, like an awfully slow function or such) that could make this function Faster. Please note that I am using Python 3.1.1 if that matters

+4

performance optimization python math

user280590 Feb 24 '10 at 18:33

source share

6 answers

Since quickCheck is called about 25,000,000 times, you might want to use memoization to cache responses.

You can do memoization in C as well as Python. In C, it will also be much faster.

You calculate 1/6 on each iteration of quickCheck. I'm not sure if this will be optimized by Python, but if you can avoid rearranging the constant values, you will find that everything is faster. C compilers do this for you.

Doing things like if condition: return True; else: return False if condition: return True; else: return False , stupid - and time consuming. Just do return condition .

In Python 3.x, /2 should create floating point values. For this you need integers. You must use division //2 . It will be closer to version C in terms of what it does, but I don’t think it is much faster.

Finally, Python is usually interpreted. The interpreter will always be significantly slower than C.

+17

S. Lott Feb 24 '10 at 18:45

source share

Since the function p () increases monotonically, you can avoid comparing the values, since g> i implies p (g)> p (i). In addition, the inner loop can be broken earlier, since p (g) - p (i)> = diff means p (g + 1) - p (i)> = diff.

Also for the sake of correctness, I changed the equality comparison in quickCheck to compare the difference with epsilon, because the exact floating-point comparison is pretty fragile.

On my machine, this reduced execution time to 7.8 ms using Python 2.6. Using PyPy with JIT reduced it to 0.77 ms.

This shows that before moving on to micro-optimization, he pays to look for algorithmic optimizations. Microoptimization greatly complicates the determination of algorithmic changes for relatively minor advantages.

 EPS = 0.00000001 def quickCheck(n): partial_c = sqrt(1 + 24*n) / -6 c = 1/6 + partial_c if abs(int(c) - c) < EPS: return True c = 1/6 - partial_c if abs(int(c) - c) < EPS: return True return False def p(i): return i * (3 * i - 1 ) / 2 def main(maxNumber): diff = 1000 for i in range(1, maxNumber): for g in range(i+1, maxNumber): if p(g) - p(i) >= diff: break if quickCheck(p(g) - p(i)) and quickCheck(p(g) + p(i)): print('New couple ', p(g), p(i), p(g) - p(i)) diff = p(g) - p(i)

+5

Ants aasma Feb 25 '10 at 19:45

source share

There are several python compilers that could make a good bit for you. Take a look at Psyco .

Another way to work with programs that use math intensively is to rewrite most of the work in a mathematical kernel, such as NumPy , so a highly optimized code does the job, and your python-based code only calculates. To get the most out of this strategy, avoid doing calculations in loops and instead let the math kernel do all of this.

+4

SingleNegationElimination Feb 24 '10 at 19:15

source share

Other respondents have already mentioned several optimizations that will help. However, in the long run, you will not be able to map C performance in Python. Python is a great tool, but since it is interpreted, it is not suitable for heavy crunches or other applications where performance is key.

Also, even in your version of C, your inner loop might use some help. Updated Version:

  for(i = 1; i < maxNumber; i++){ for(g = 1; g < maxNumber; g++){ if(i == g) continue; max=i; min=g; if (max<min) { // xor swap - could use swap(p_max,p_min) instead. max=max^min; min=max^min; max=max^min; } p_max=P(max); p_min=P(min); p_i=P(i); p_g=P(g); if(p_max - p_min < diff && fullCheck(p_max-p_min) && fullCheck(p_i + p_g)){ diff = p_max - p_min; printf("We have a couple %llu %llu with diff %llu\n", p_i, p_g, diff); } } } /////////////////////////// float fullCheck(int number){ float den=sqrt(1+24*number)/6.0; float check = 1/6.0 - den; float check2 = 1/6.0 + den; if(check == (int)check) return check; if(check2 == (int)check2) return check2; return 0.0; }

Department, function calls, etc. are expensive. Also, calculating them once and storing them in vars, as I did, can make things more readable.

You might consider declaring P () as inline, or rewrite it as a preprocessor macro. Depending on how good your optimizer is, you can do some arithmetic yourself and simplify its implementation.

Your implementation of fullCheck () will return what seems unacceptable, since 1/6 == 0, where 1 / 6.0 will return 0.166 ... as you would expect.

This is a very brief question about what you can do for your C code to improve performance. This will no doubt widen the gap between C and Python performance.

+2

David Lively Feb 24 '10 at 18:47

source share

The 20x difference between Python and C for a number crunching task seems very useful to me.

Check for the usual performance differences for some processor-intensive tasks (keep in mind that the scale is logarithmic).

But look at the bright side, what is 1 minute of processor time compared to the brain and the typing time you saved by writing Python instead of C? :-)

+1

fortran Feb 24 '10 at 18:56

source share

truppo · Accepted Answer · 2010-02-24T19:29:50+0000

I did this from ~ 7 seconds to ~ 3 seconds on my machine:

The calculated i * (3 * i - 1 ) / 2 for each value in yours, it was calculated twice quite a lot.
Cached quickCheck calls
Removed if i == g , adding +1 to range
Removed if p_i > p_g , since p_i is always less than p_g

Also add a quickCheck function inside main to make all variables local (which have a faster search than global). I am sure that there are more opportunities for micro-optimization.

 def main(): maxNumber = 5000 diff = 1000 p = {} quickCache = {} for i in range(maxNumber): p[i] = i * (3 * i - 1 ) / 2 def quickCheck(n): if n in quickCache: return quickCache[n] partial_c = (sqrt(1 + 24 * (n)))/-6 c = 1/6 + partial_c if int(c.real) == c.real: quickCache[n] = True return True c = c - 2*partial_c if int(c.real) == c.real: quickCache[n] = True return True quickCache[n] = False return False for i in range(1, maxNumber): mi = p[i] for g in range(i+1, maxNumber): ma = p[g] if ma - mi < diff and quickCheck(ma - mi) and quickCheck(ma + mi): print('New couple ', ma, mi) diff = ma - mi

Python code optimization (20 times slower than C)

More articles: