I started with @mgilson's answer and reworked it a bit. I wanted to test the βsolution bitβ and the lookup table methods, as discussed in my answer to the original question: fooobar.com/questions/1 / ...
I made a few changes to the original. Some of them were just stylish things that reflect my personal preferences. However, I found errors in the original, and I find it important to measure the correct code.
I forced Python code to now accept arguments from the command line and wrote a shell script that runs a Python script using Python 2.x, Python 3.x, and PyPy. Exact versions: Python 2.7.6, Python 3.4.0, and PyPy 2.7.3
I ran tests on Linux Mint 17, 64-bit. The AMD Phenom 9850 processor is clocked at 2.5 GHz with 16 GB of RAM. Linux kernel version, for uname -r ,: 3.13.0-24-generic
The reason I got the code to accept arguments from the command line is because 327 is a pretty short list. I realized that the sum() and generator expressions would be better when the list is much longer, so I made a list from a list and the number of samples is passed from the command line. The results show which Python, as well as the length of the list and the number of tests.
Conclusion: to my surprise, even with a long list, Python used sum() with the list the fastest! There is some overhead for running a generator, which seems to be slower than the overhead of creating a list and then tearing it down.
However, if the list turned out to be really large, I expected that the generator would start to exit due to the understanding of the list. A list of a million random values, listcomp was even faster, but when I went to 16 million random values, genexp got faster. And the speed limit of the expression of the generator is small for shorter lists. Therefore, I still prefer the expression of the generator as a way to solve this problem in Python.
Interestingly, PyPy was the fastest when searching in a table. That makes sense: it was the fastest way I found in C, and PyPy generates native code from JIT.
For CPython with its virtual machine, it is faster to invoke one operation than several operations; Python VM overhead can outweigh more expensive fundamental operations. Thus, integer division is faster than bit masking plus bit offset, since division is one operation. But in PyPy, bit offset + shift is much faster than division.
Also, in CPython, using sum() allows your code to work in C components, so it can be very fast; but in PyPy, sum() is slower than just writing a strighforward loop, that JIT can turn into an evil fast native loop. I assume that the generator technique is not easy for PyPy and is optimized into native code.
Shell script:
for P in python python3 pypy; do echo "$P ($1, $2)" $P test_branches.py $1 $2 echo done
Python Code:
import random import sys import timeit try: RANGE = xrange except NameError: RANGE = range if len(sys.argv) != 3: print("Usage: python test_branches.py <length_of_array> <number_of_trials>") sys.exit(1) TEST_DATA_LEN = int(sys.argv[1]) NUM_REPEATS = int(sys.argv[2]) _test_data = [random.randint(0,255) for _ in RANGE(TEST_DATA_LEN)] def test0(data): """original way""" total = 0 for i in RANGE(TEST_DATA_LEN): if data[i] >= 128: total += data[i] return total def test1(data): """better loop""" total = 0 for n in data: if n >= 128: total += n return total def test2(data): """sum + generator""" return sum(n for n in data if n >= 128) def test3(data): """sum + listcomp""" return sum([n for n in data if n >= 128]) def test4(data): """decision bit -- bit mask and shift""" lst = [0, 0] for n in data: lst[(n & 0x80) >> 7] += n return lst[1] def test5(data): """decision bit -- division""" lst = [0, 0] for n in data: lst[n // 128] += n return lst[1] _lut = [0 if n < 128 else n for n in RANGE(256)] def test6(data): """lookup table""" total = 0 for n in data: total += _lut[n] return total def test7(data): """lookup table with sum()""" return sum(_lut[n] for n in data) test_functions = [v for k,v in globals().items() if k.startswith("test")] test_functions.sort(key=lambda x: x.__name__) correct = test0(_test_data) for fn in test_functions: name = fn.__name__ doc = fn.__doc__ if fn(_test_data) != correct: print("{}() not correct!!!".format(name)) s_call = "{}(_test_data)".format(name) s_import = "from __main__ import {},_test_data".format(name) t = timeit.timeit(s_call,s_import,number=NUM_REPEATS) print("{:7.03f}: {}".format(t, doc))
Results:
python (327, 100000) 3.170: original way 2.211: better loop 2.378: sum + generator 2.188: sum + listcomp 5.321: decision bit -- bit mask and shift 4.977: decision bit -- division 2.937: lookup table 3.464: lookup table with sum() python3 (327, 100000) 5.786: original way 3.444: better loop 3.286: sum + generator 2.968: sum + listcomp 8.858: decision bit -- bit mask and shift 7.056: decision bit -- division 4.640: lookup table 4.783: lookup table with sum() pypy (327, 100000) 0.296: original way 0.312: better loop 1.932: sum + generator 1.011: sum + listcomp 0.172: decision bit -- bit mask and shift 0.613: decision bit -- division 0.140: lookup table 1.977: lookup table with sum() python (65536, 1000) 6.528: original way 4.661: better loop 4.974: sum + generator 4.150: sum + listcomp 10.971: decision bit -- bit mask and shift 10.218: decision bit -- division 6.052: lookup table 7.070: lookup table with sum() python3 (65536, 1000) 12.999: original way 7.618: better loop 6.826: sum + generator 5.587: sum + listcomp 19.326: decision bit -- bit mask and shift 14.917: decision bit -- division 9.779: lookup table 9.575: lookup table with sum() pypy (65536, 1000) 0.681: original way 0.884: better loop 2.640: sum + generator 2.642: sum + listcomp 0.316: decision bit -- bit mask and shift 1.573: decision bit -- division 0.280: lookup table 1.561: lookup table with sum() python (1048576, 100) 10.371: original way 7.065: better loop 7.910: sum + generator 6.579: sum + listcomp 17.583: decision bit -- bit mask and shift 15.426: decision bit -- division 9.285: lookup table 10.850: lookup table with sum() python3 (1048576, 100) 20.435: original way 11.221: better loop 10.162: sum + generator 8.981: sum + listcomp 29.108: decision bit -- bit mask and shift 23.626: decision bit -- division 14.706: lookup table 14.173: lookup table with sum() pypy (1048576, 100) 0.985: original way 0.926: better loop 5.462: sum + generator 6.623: sum + listcomp 0.527: decision bit -- bit mask and shift 2.334: decision bit -- division 0.481: lookup table 5.800: lookup table with sum() python (16777216, 10) 15.704: original way 11.331: better loop 11.995: sum + generator 13.787: sum + listcomp 28.527: decision bit -- bit mask and shift 25.204: decision bit -- division 15.349: lookup table 17.607: lookup table with sum() python3 (16777216, 10) 32.822: original way 18.530: better loop 16.339: sum + generator 14.999: sum + listcomp 47.755: decision bit -- bit mask and shift 38.930: decision bit -- division 23.704: lookup table 22.693: lookup table with sum() pypy (16777216, 10) 1.559: original way 2.234: better loop 6.586: sum + generator 10.931: sum + listcomp 0.817: decision bit -- bit mask and shift 3.714: decision bit -- division 0.752: lookup table 3.837: lookup table with sum()