Iterating arrays in cython, list faster than np.array?

TL; DR: in cython, why (or when?) Iterates through a numpy array faster than iterating through a python list?

In general: I used to use Cython and was able to get huge accelerations compared to the naive python impl ', However, figuring out what needs to be done seems nontrivial.

Consider the following 3 implementations of sum (). They are in a cython file called "cy" (obviously np.sum () is there, but other than my point.)

Naive Python:

def sum_naive(A): s = 0 for a in A: s += a return s 

Cython with a function that expects a python list:

 def sum_list(A): cdef unsigned long s = 0 for a in A: s += a return s 

Cython with a function that expects a numpy array.

 def sum_np(np.ndarray[np.int64_t, ndim=1] A): cdef unsigned long s = 0 for a in A: s += a return s 

I would expect that in terms of runtime sum_np <sum_list <sum_naive , however, the following script demonstrates the opposite (for completeness, I added np.sum ())

 N = 1000000 v_np = np.array(range(N)) v_list = range(N) %timeit cy.sum_naive(v_list) %timeit cy.sum_naive(v_np) %timeit cy.sum_list(v_list) %timeit cy.sum_np(v_np) %timeit v_np.sum() 

with the results:

 In [18]: %timeit cyMatching.sum_naive(v_list) 100 loops, best of 3: 18.7 ms per loop In [19]: %timeit cyMatching.sum_naive(v_np) 1 loops, best of 3: 389 ms per loop In [20]: %timeit cyMatching.sum_list(v_list) 10 loops, best of 3: 82.9 ms per loop In [21]: %timeit cyMatching.sum_np(v_np) 1 loops, best of 3: 1.14 s per loop In [22]: %timeit v_np.sum() 1000 loops, best of 3: 659 us per loop 

What's happening? Why is cython + numpy slow?

PS
I use #cython: boundscheck = False
#cython: wraparound = False

+6
source share
2 answers

There is a better way in cython to implement this, at least on my np.sum machine, because it avoids type checking and other things that numpy usually needs to do when working with an arbitrary array:

 #cython.wraparound=False #cython.boundscheck=False cimport numpy as np def sum_np(np.ndarray[np.int64_t, ndim=1] A): cdef unsigned long s = 0 for a in A: s += a return s def sum_np2(np.int64_t[::1] A): cdef: unsigned long s = 0 size_t k for k in range(A.shape[0]): s += A[k] return s 

And then the timings:

 N = 1000000 v_np = np.array(range(N)) v_list = range(N) 

 %timeit sum(v_list) %timeit sum_naive(v_list) %timeit np.sum(v_np) %timeit sum_np(v_np) %timeit sum_np2(v_np) 10 loops, best of 3: 19.5 ms per loop 10 loops, best of 3: 64.9 ms per loop 1000 loops, best of 3: 1.62 ms per loop 1 loops, best of 3: 1.7 s per loop 1000 loops, best of 3: 1.42 ms per loop 

You do not want to iterate over the numpy array through the Python style, but rather use access controls using indexing, as this can be translated into pure C rather than relying on the Python API.

+7
source

a is untyped and therefore there will be many conversions from Python types to C and vice versa. They can be slow.

JoshAdel correctly pointed out that instead of repeating, you should iterate over the range. Cython converts indexing to C, which is fast.


Using cython -a myfile.pyx will highlight such things for you; you want all your loop logic to be white for maximum speed.

PS: Please note that np.ndarray[np.int64_t, ndim=1] deprecated and deprecated in favor of a faster and more general long[:] .

+2
source

Source: https://habr.com/ru/post/954330/


All Articles