How to quickly make this python script? (benchmarking related to industry prediction from post from here)

From here is the branch prediction problem , I started writing a version of the Python program to check the runtime of sorted / unsorted versions in Python. At first I tried to sort.

Here is the code:

import time from random import * arraysize = 327 data = [] for c in range(arraysize): data.append( randint( 1 , 256 ) ) ## !!! with this, the next loop runs faster data.sort() ## test start_time = time.clock() sum = 0 for i in range(100000): for c in range(arraysize): if data[c] >= 128: sum += data[c] print time.clock() - start_time 

I'm not sure about the accuracy of my simple time methodology, but it seems good enough. When I set arraysize = 32768 , I was waiting> 20 minutes for the first time! More than 20 minutes! But with arraysize = 327 , I get the time of 8.141656691s .

Please correct me if I am mistaken somewhere in my code, or in some way using Numpy / Scipy will speed up the process. Thanks.

+4
source share
2 answers

I started with @mgilson's answer and reworked it a bit. I wanted to test the β€œsolution bit” and the lookup table methods, as discussed in my answer to the original question: fooobar.com/questions/1 / ...

I made a few changes to the original. Some of them were just stylish things that reflect my personal preferences. However, I found errors in the original, and I find it important to measure the correct code.

I forced Python code to now accept arguments from the command line and wrote a shell script that runs a Python script using Python 2.x, Python 3.x, and PyPy. Exact versions: Python 2.7.6, Python 3.4.0, and PyPy 2.7.3

I ran tests on Linux Mint 17, 64-bit. The AMD Phenom 9850 processor is clocked at 2.5 GHz with 16 GB of RAM. Linux kernel version, for uname -r ,: 3.13.0-24-generic

The reason I got the code to accept arguments from the command line is because 327 is a pretty short list. I realized that the sum() and generator expressions would be better when the list is much longer, so I made a list from a list and the number of samples is passed from the command line. The results show which Python, as well as the length of the list and the number of tests.

Conclusion: to my surprise, even with a long list, Python used sum() with the list the fastest! There is some overhead for running a generator, which seems to be slower than the overhead of creating a list and then tearing it down.

However, if the list turned out to be really large, I expected that the generator would start to exit due to the understanding of the list. A list of a million random values, listcomp was even faster, but when I went to 16 million random values, genexp got faster. And the speed limit of the expression of the generator is small for shorter lists. Therefore, I still prefer the expression of the generator as a way to solve this problem in Python.

Interestingly, PyPy was the fastest when searching in a table. That makes sense: it was the fastest way I found in C, and PyPy generates native code from JIT.

For CPython with its virtual machine, it is faster to invoke one operation than several operations; Python VM overhead can outweigh more expensive fundamental operations. Thus, integer division is faster than bit masking plus bit offset, since division is one operation. But in PyPy, bit offset + shift is much faster than division.

Also, in CPython, using sum() allows your code to work in C components, so it can be very fast; but in PyPy, sum() is slower than just writing a strighforward loop, that JIT can turn into an evil fast native loop. I assume that the generator technique is not easy for PyPy and is optimized into native code.

Shell script:

 for P in python python3 pypy; do echo "$P ($1, $2)" $P test_branches.py $1 $2 echo done 

Python Code:

 import random import sys import timeit try: RANGE = xrange except NameError: RANGE = range if len(sys.argv) != 3: print("Usage: python test_branches.py <length_of_array> <number_of_trials>") sys.exit(1) TEST_DATA_LEN = int(sys.argv[1]) NUM_REPEATS = int(sys.argv[2]) _test_data = [random.randint(0,255) for _ in RANGE(TEST_DATA_LEN)] def test0(data): """original way""" total = 0 for i in RANGE(TEST_DATA_LEN): if data[i] >= 128: total += data[i] return total def test1(data): """better loop""" total = 0 for n in data: if n >= 128: total += n return total def test2(data): """sum + generator""" return sum(n for n in data if n >= 128) def test3(data): """sum + listcomp""" return sum([n for n in data if n >= 128]) def test4(data): """decision bit -- bit mask and shift""" lst = [0, 0] for n in data: lst[(n & 0x80) >> 7] += n return lst[1] def test5(data): """decision bit -- division""" lst = [0, 0] for n in data: lst[n // 128] += n return lst[1] _lut = [0 if n < 128 else n for n in RANGE(256)] def test6(data): """lookup table""" total = 0 for n in data: total += _lut[n] return total def test7(data): """lookup table with sum()""" return sum(_lut[n] for n in data) test_functions = [v for k,v in globals().items() if k.startswith("test")] test_functions.sort(key=lambda x: x.__name__) correct = test0(_test_data) for fn in test_functions: name = fn.__name__ doc = fn.__doc__ if fn(_test_data) != correct: print("{}() not correct!!!".format(name)) s_call = "{}(_test_data)".format(name) s_import = "from __main__ import {},_test_data".format(name) t = timeit.timeit(s_call,s_import,number=NUM_REPEATS) print("{:7.03f}: {}".format(t, doc)) 

Results:

 python (327, 100000) 3.170: original way 2.211: better loop 2.378: sum + generator 2.188: sum + listcomp 5.321: decision bit -- bit mask and shift 4.977: decision bit -- division 2.937: lookup table 3.464: lookup table with sum() python3 (327, 100000) 5.786: original way 3.444: better loop 3.286: sum + generator 2.968: sum + listcomp 8.858: decision bit -- bit mask and shift 7.056: decision bit -- division 4.640: lookup table 4.783: lookup table with sum() pypy (327, 100000) 0.296: original way 0.312: better loop 1.932: sum + generator 1.011: sum + listcomp 0.172: decision bit -- bit mask and shift 0.613: decision bit -- division 0.140: lookup table 1.977: lookup table with sum() python (65536, 1000) 6.528: original way 4.661: better loop 4.974: sum + generator 4.150: sum + listcomp 10.971: decision bit -- bit mask and shift 10.218: decision bit -- division 6.052: lookup table 7.070: lookup table with sum() python3 (65536, 1000) 12.999: original way 7.618: better loop 6.826: sum + generator 5.587: sum + listcomp 19.326: decision bit -- bit mask and shift 14.917: decision bit -- division 9.779: lookup table 9.575: lookup table with sum() pypy (65536, 1000) 0.681: original way 0.884: better loop 2.640: sum + generator 2.642: sum + listcomp 0.316: decision bit -- bit mask and shift 1.573: decision bit -- division 0.280: lookup table 1.561: lookup table with sum() python (1048576, 100) 10.371: original way 7.065: better loop 7.910: sum + generator 6.579: sum + listcomp 17.583: decision bit -- bit mask and shift 15.426: decision bit -- division 9.285: lookup table 10.850: lookup table with sum() python3 (1048576, 100) 20.435: original way 11.221: better loop 10.162: sum + generator 8.981: sum + listcomp 29.108: decision bit -- bit mask and shift 23.626: decision bit -- division 14.706: lookup table 14.173: lookup table with sum() pypy (1048576, 100) 0.985: original way 0.926: better loop 5.462: sum + generator 6.623: sum + listcomp 0.527: decision bit -- bit mask and shift 2.334: decision bit -- division 0.481: lookup table 5.800: lookup table with sum() python (16777216, 10) 15.704: original way 11.331: better loop 11.995: sum + generator 13.787: sum + listcomp 28.527: decision bit -- bit mask and shift 25.204: decision bit -- division 15.349: lookup table 17.607: lookup table with sum() python3 (16777216, 10) 32.822: original way 18.530: better loop 16.339: sum + generator 14.999: sum + listcomp 47.755: decision bit -- bit mask and shift 38.930: decision bit -- division 23.704: lookup table 22.693: lookup table with sum() pypy (16777216, 10) 1.559: original way 2.234: better loop 6.586: sum + generator 10.931: sum + listcomp 0.817: decision bit -- bit mask and shift 3.714: decision bit -- division 0.752: lookup table 3.837: lookup table with sum() 
+8
source

One small optimization, which is also a matter of style, lists can be crossed directly, rather than indexing them:

 for d in data: if d >= 128: sum += d 

This saves you a few function calls.

This loop can also work faster if you use the built-in sum function:

 my_sum += sum( d for d in data if d>=128 ) 

The comp list may be faster than the generator above (due to a small additional memory):

 my_sum += sum( [d for d in data if d>=128] ) 

Of course, from the point of view of the algorithm, we can get rid of the closed loop itself, since the sum of the inner loop will not change:

 my_sum = 100000 * sum( d for d in data if d>=128 ) 

but I guess you already knew that ...


Finally, here is how I would do it:

 import random import timeit N = 327 testarr = [random.randint(1,256) for _ in range(N)] def test1(data): """Original way""" s = 0 for c in range(N): if data[c] >= 128: s += data[c] def test2(data): """better loop""" s = 0 for d in data: if d >= 128: s += d def test3(data): """ sum + generator """ sum( d for d in data if d >= 128 ) def test4(data): """ sum + listcomp """ sum( [ d for d in data if d >= 128 ]) NNUMBER = 100000 print timeit.timeit('test1(testarr)','from __main__ import test1,testarr',number=NNUMBER) print timeit.timeit('test2(testarr)','from __main__ import test2,testarr',number=NNUMBER) print timeit.timeit('test3(testarr)','from __main__ import test3,testarr',number=NNUMBER) print timeit.timeit('test4(testarr)','from __main__ import test4,testarr',number=NNUMBER) 

My results (OS-X Mavericks, python 2.7.5 - mileage may vary):

 2.80526804924 # loop with range 2.04129099846 # loop over elements 1.91441488266 # sum with generator 2.05234098434 # sum with list 

For me, the generator with sum wins by a small margin. The comp list with sum and explicit loop are roughly equal. Index looping (which is not surprising) is the slowest.

+4
source

Source: https://habr.com/ru/post/1435499/


All Articles