Problem Hi guys. I am looking for some python performance recommendations. Some information about my problem:
Considering:
- A
(x,y) grid of nodes, each with a value (0...255) , starting from 0 - List of
N input coordinates of each at a specified location in the range (0...x, 0...y) - The
Z value that defines the "neighborhood" in counting nodes
Increase the value of the node at the input coordinate and the neighbors of the node. Neighbors outside the edge of the grid are ignored. (Without packaging)
BASE CASE: a grid of nodes of size 1024x1024 , with input coordinates of 400 and a range of Z nodes of 75 .
Processing should be O(x*y*Z*N) . I expect x, y, and Z to remain roughly around the values โโin the base case, but the number of input N coordinates can increase to 100,000. My goal is to minimize processing time.
Current Results Between my start and the comments below, we have several implementations.
Speed โโon my 2.26 GHz Intel Core 2 Duo with Python 2.6.1:
f1: 2.819s f2: 1.567s f3: 1.593s f: 1.579s f3b: 1.526s f4: 0.978s
f1 is the initial naive implementation: three nested for loops. f2 replaces the inner for loop with list comprehension. f3 based on Andreiโs assumption in the comments and replaces the external for with map() f - Chris' suggestion in the answers below f3b takes the value kriss f3 f4 - Alex's contribution.
The code below is for your reading.
Question How to reduce processing time? I would prefer sub-1.0s for test parameters.
Please keep recommendations for native Python. I know that I can switch to a third-party package, for example numpy , but I try to avoid third-party packages. In addition, I created random input coordinates and simplified the definition of node value updates to make our discussion simple. The specifics should change a bit and go beyond the scope of my question.
Thanks a lot!
f1 is the initial naive implementation: three nested for loops. def f1(x,y,n,z): rows = [[0]*x for i in xrange(y)] for i in range(n): inputX, inputY = (int(x*random.random()), int(y*random.random())) topleft = (inputX - z, inputY - z) for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)): for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)): if rows[i][j] <= 255: rows[i][j] += 1
f2 replaces the inner for loop with list comprehension.
def f2(x,y,n,z): rows = [[0]*x for i in xrange(y)] for i in range(n): inputX, inputY = (int(x*random.random()), int(y*random.random())) topleft = (inputX - z, inputY - z) for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)): l = max(0, topleft[1]) r = min(topleft[1]+(z*2), y) rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE: f3 based on Andrei in the comments and replaces the external for with map() . My first hack requires several queries outside the local scale, in particular, those recommended against by Guido: local variables look much faster than global or built-in variable search. I programmed everything except the reference to the main data structure to minimize this overhead.
rows = [[0]*x for i in xrange(y)] def f3(x,y,n,z): inputs = [(int(x*random.random()), int(y*random.random())) for i in range(n)] rows = map(g, inputs) def g(input): inputX, inputY = input topleft = (inputX - 75, inputY - 75) for i in xrange(max(0, topleft[0]), min(topleft[0]+(75*2), 1024)): l = max(0, topleft[1]) r = min(topleft[1]+(75*2), 1024) rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE3: ChristopeD also pointed to several improvements.
def f(x,y,n,z): rows = [[0] * y for i in xrange(x)] rn = random.random for i in xrange(n): topleft = (int(x*rn()) - z, int(y*rn()) - z) l = max(0, topleft[1]) r = min(topleft[1]+(z*2), y) for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)): rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
UPDATE4: kriss added several improvements to f3 , replacing min / max with the new ternary operator syntax.
def f3b(x,y,n,z): rn = random.random rows = [g1(x, y, z) for x, y in [(int(x*rn()), int(y*rn())) for i in xrange(n)]] def g1(x, y, z): l = y - z if y - z > 0 else 0 r = y + z if y + z < 1024 else 1024 for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ): rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE5: Alex , with its significant revision, adding a separate map() operation to limit the values โโto 255 and delete all non-local queries. Significant differences are nontrivial.
def f4(x,y,n,z): rows = [[0]*y for i in range(x)] rr = random.randrange inc = (1).__add__ sat = (0xff).__and__ for i in range(n): inputX, inputY = rr(x), rr(y) b = max(0, inputX - z) t = min(inputX + z, x) l = max(0, inputY - z) r = min(inputY + z, y) for i in range(b, t): rows[i][l:r] = map(inc, rows[i][l:r]) for i in range(x): rows[i] = map(sat, rows[i])
Also, since we all seem to be hacking options, here is my test harness for comparing speeds: (improved by ChristopheD)
def timing(f,x,y,z,n): fn = "%s(%d,%d,%d,%d)" % (f.__name__, x, y, z, n) ctx = "from __main__ import %s" % f.__name__ results = timeit.Timer(fn, ctx).timeit(10) return "%4.4s: %.3f" % (f.__name__, results / 10.0) if __name__ == "__main__": print timing(f, 1024, 1024, 400, 75)