Is there a way to speed up this feature?

Question

Is there a way to speed up this feature?

I compare the performance of this F # function:

let e28 N = seq {for i in 2L..2L..N do for j in 1..4 -> i} |> Seq.scan (+) 1L |> Seq.sum

with Python 3.3 equivalents:

 def e28a(N = 100000): diagNumber = 1 sum = diagNumber for width in range(2, N+1, 2): for j in range(4): diagNumber += width sum += diagNumber return sum import itertools as it def e28b(N = 100000): return sum(it.accumulate(it.chain([1], (i for i in range(2, N+1, 2) for j in range(4))))) import numpy as np def e28c(N = 100000): return np.sum(np.cumsum(np.fromiter(chain([1], (i for i in range(2, N+1, 2) for j in range(4))), np.int64)))

and I get the 64-bit performance of CPython 3.3.1 on Windows 7 about 574 times slower than C ++. Here are the times for N = 100000:

e28: 23ms; e28a: 48.4 m; e28b: 49.7 m; e28c: 40.2 m; C ++ Version: 0.07ms

Is there a low level of hanging fruit in optimizing Python code without changing the underlying algorithm?

+4

benchmarking python-3.x f # list-comprehension

Paul jurczak Jul 05 '13 at 10:23

source share

3 answers

The F # version can be accelerated by ~ 10x by switching to a procedural, mutable approach (like your python e28a ). When the “payload operation” (simply + in this case) is so trivial, the use of combinators adds a relatively significant overhead. As a side note, Seq.sum uses check arithmetic, which also adds overhead.

One of the nice things about F # is that you can revert to a process / mutable style if necessary for a Persian critical hot way.

 let e28_original N = seq { for i in 2UL..2UL..N do for j in 1..4 do yield i } |> Seq.scan (+) 1UL |> Seq.sum let e28_mutable N = let mutable sum = 1UL let mutable total = sum for i in 2UL..2UL..N do for j in 1..4 do sum <- sum + i total <- total + sum total let time f = f () |> ignore // allow for warmup / JIT let sw = System.Diagnostics.Stopwatch.StartNew() let result = f () sw.Stop() printfn "Result: %A Elapsed: %A" result sw.Elapsed time (fun _ -> e28_original 100000UL) time (fun _ -> e28_mutable 100000UL)

Result

 Result: 666691667100001UL Elapsed: 00:00:00.0429414 Result: 666691667100001UL Elapsed: 00:00:00.0034971

+4

latkin Jul 05 '13 at 20:33

source share

It is almost twice as fast on my car. It uses memoization as well as basic arithmetic deductions.

You must define a global variable.

 summi=2 def e28d(N = 100000): def memo(width): global summi summi+=width*4+4 return summi-width*2+2 x= sum((memo(width*4)) for width in range (2, N+1, 2))+1 return x

Results:
e28a:

0.0591201782227 seconds

e28d:

0.0349650382996 seconds

I hope that this is at least constructive. Note: you will have to modulate it depending on whether the number is odd or not.

Update: Here is a function that works a hundred times faster in python (about 0.5 ms for N = 100000), avoiding loops completely:

 import math def e28e(X = 100000): keyint, keybool=int(X/6), X%6 if keybool/2==0: keyvar=(16*keyint+sum(range(keyint))*12) elif keybool/2==1: keyvar=(44*keyint+sum(range(keyint))*36+7) else: keyvar=(28*(keyint+1)+sum(range(keyint+1))*60-2) X-=keybool%2 diag= math.pow(X,2)+2*X+1 newvar=keyint+int(X/2)+1 summ= int(diag*newvar+keyvar) return summ

+1

JonLeslieHarding Jul 05 '13 at 13:52

source share

mydogisbox · Accepted Answer · 2013-07-05T18:26:40+0000

Using your version of F #, I got:

 > e28(100000L);; Real: 00:00:00.061, CPU: 00:00:00.062, GC gen0: 2, gen1: 0, gen2: 0 val it : int64 = 666691667100001L

Using:

 let e28d N = seq {2L..2L..N} |> Seq.collect(fun x->seq{yield x;yield x; yield x; yield x}) |> Seq.scan (+) 1L |> Seq.sum

I got:

 > e28d(100000L);; Real: 00:00:00.040, CPU: 00:00:00.031, GC gen0: 2, gen1: 0, gen2: 0 val it : int64 = 666691667100001L

You will probably have a hard time getting Python to work with both F # and since F # has been compiled and interpreted by Python. At the same time, the above improvement will work on python too:

 >>> def e28a(N = 100000): diagNumber = 1; sum = diagNumber; for width in range(2, N+1, 2): for j in range(4): diagNumber += width; sum += diagNumber; return sum; >>> if __name__ == '__main__': import timeit print(timeit.timeit("e28a()", setup="from __main__ import e28a", number=10)) 0.5249497228663813 >>> def e28a(N = 100000): diagNumber = 1; sum = diagNumber; for width in range(2, N+1, 2): diagNumber += width; sum += diagNumber; diagNumber += width; sum += diagNumber; diagNumber += width; sum += diagNumber; diagNumber += width; sum += diagNumber; return sum; >>> if __name__ == '__main__': import timeit print(timeit.timeit("e28a()", setup="from __main__ import e28a", number=10)) 0.2585966329330063 >>>

Part of this improvement comes from fewer function calls, that is:

 >>> def e28a(N = 100000): diagNumber = 1; sum = diagNumber; temp_range = range(4) #Change here for width in range(2, N+1, 2): for j in temp_range: #Change here diagNumber += width; sum += diagNumber; return sum; >>> if __name__ == '__main__': import timeit print(timeit.timeit("e28a()", setup="from __main__ import e28a", number=10)) 0.40251470339956086 >>>

And I think the other part comes from deleting the loop. Both of them can be quite expensive in Python.

Is there a way to speed up this feature?

More articles: