TL; DR: in cython, why (or when?) Iterates through a numpy array faster than iterating through a python list?
In general: I used to use Cython and was able to get huge accelerations compared to the naive python impl ', However, figuring out what needs to be done seems nontrivial.
Consider the following 3 implementations of sum (). They are in a cython file called "cy" (obviously np.sum () is there, but other than my point.)
Naive Python:
def sum_naive(A): s = 0 for a in A: s += a return s
Cython with a function that expects a python list:
def sum_list(A): cdef unsigned long s = 0 for a in A: s += a return s
Cython with a function that expects a numpy array.
def sum_np(np.ndarray[np.int64_t, ndim=1] A): cdef unsigned long s = 0 for a in A: s += a return s
I would expect that in terms of runtime sum_np <sum_list <sum_naive , however, the following script demonstrates the opposite (for completeness, I added np.sum ())
N = 1000000 v_np = np.array(range(N)) v_list = range(N) %timeit cy.sum_naive(v_list) %timeit cy.sum_naive(v_np) %timeit cy.sum_list(v_list) %timeit cy.sum_np(v_np) %timeit v_np.sum()
with the results:
In [18]: %timeit cyMatching.sum_naive(v_list) 100 loops, best of 3: 18.7 ms per loop In [19]: %timeit cyMatching.sum_naive(v_np) 1 loops, best of 3: 389 ms per loop In [20]: %timeit cyMatching.sum_list(v_list) 10 loops, best of 3: 82.9 ms per loop In [21]: %timeit cyMatching.sum_np(v_np) 1 loops, best of 3: 1.14 s per loop In [22]: %timeit v_np.sum() 1000 loops, best of 3: 659 us per loop
What's happening? Why is cython + numpy slow?
PS
I use #cython: boundscheck = False
#cython: wraparound = False