Cython memoryview is slower than expected

Question

Cython memoryview is slower than expected

I started using memoryviews in cython to access numpy arrays. One of the benefits they have is that they are significantly faster than supporting old buffers: http://docs.cython.org/src/userguide/memoryviews.html#comparison-to-the-old-buffer- support

However, I have an example where the old numpy buffer support is faster than memory! How can it be?! I wonder if I use memory correctly?

This is my test:

import numpy as np cimport numpy as np cimport cython @cython.boundscheck(False) @cython.wraparound(False) cpdef np.ndarray[np.uint8_t, ndim=2] image_box1(np.ndarray[np.uint8_t, ndim=2] im, np.ndarray[np.float64_t, ndim=1] pd, int box_half_size): cdef unsigned int p0 = <int>(pd[0] + 0.5) cdef unsigned int p1 = <int>(pd[1] + 0.5) cdef unsigned int top = p1 - box_half_size cdef unsigned int left = p0 - box_half_size cdef unsigned int bottom = p1 + box_half_size cdef unsigned int right = p0 + box_half_size cdef np.ndarray[np.uint8_t, ndim=2] box = im[top:bottom, left:right] return box @cython.boundscheck(False) @cython.wraparound(False) cpdef np.uint8_t[:, ::1] image_box2(np.uint8_t[:, ::1] im, np.float64_t[:] pd, int box_half_size): cdef unsigned int p0 = <int>(pd[0] + 0.5) cdef unsigned int p1 = <int>(pd[1] + 0.5) cdef unsigned int top = p1 - box_half_size cdef unsigned int left = p0 - box_half_size cdef unsigned int bottom = p1 + box_half_size cdef unsigned int right = p0 + box_half_size cdef np.uint8_t[:, ::1] box = im[top:bottom, left:right] return box

Sync Results:

image_box1: printed numpy: 100,000 cycles, best 3: 11.2 per cycle

image_box2: memoryview: 100,000 cycles, best 3: 18.1 per cycle

These measurements are done using IPython using% timeit image_box1 (im, pd, box_half_size)

+4

python numpy cython memoryview

martinako Oct 9 '12 at 12:31

source share

1 answer

martinako · Accepted Answer · 2012-10-09T15:22:05+0000

Good! I found a problem. Because Seberg pointed out that memory looks slower because the measurement included automatic conversion from a numpy array to memory.

I used the following function to measure time from a cython module:

 def test(params): import timeit im = params[0] pd = params[1] box_half_size = params[2] t1 = timeit.Timer(lambda: image_box1(im, pd, box_half_size)) print 'image_box1: typed numpy:' print min(t1.repeat(3, 10)) cdef np.uint8_t[:, ::1] im2 = im cdef np.float64_t[:] pd2 = pd t2 = timeit.Timer(lambda: image_box2(im2, pd2, box_half_size)) print 'image_box2: memoryview:' print min(t2.repeat(3, 10))

result:

image_box1: printed numpy: 9.07607864065e-05

image_box2: memoryview: 5.81799904467e-05

So, memory is really faster!

Note that I converted im and pd to memoryviews before calling image_box2. If I do not take this step, and I pass im and pd directly, then image_box2 will be slower:

image_box1: printed numpy: 9.12262257771e-05

image_box2: memoryview: 0.000185245087778

Cython memoryview is slower than expected

More articles: