Calculate the sum of operator results without allocating an unnecessary array

I have two numpy boolean arrays ( a and b ). I need to find how many of their elements are equal. I am currently doing len(a) - (a ^ b).sum() , but the xor operation creates a completely new numpy array, as I understand it. How to effectively implement this desired behavior without creating an unnecessary temporary array?

I tried using numexpr, but I can't get it to work correctly. It does not support the notion that True is 1 and False is 0, so I should use ne.evaluate("sum(where(a==b, 1, 0))") , which takes about twice as much .

Edit: I forgot to mention that one of these arrays is actually a representation of another array of a different size, and both arrays should be considered immutable. Both arrays are 2-dimensional and have a size of about 25x40.

Yes, this is the bottleneck of my program and is worth optimizing.

+4
source share
4 answers

On my machine, this is faster:

 (a == b).sum() 

If you do not want to use any additional storage, I would suggest using numba. I am not too familiar with this, but it works well. I ran into some problems that caused Cython to take a NumPy boolean array.

 from numba import autojit def pysumeq(a, b): tot = 0 for i in xrange(a.shape[0]): for j in xrange(a.shape[1]): if a[i,j] == b[i,j]: tot += 1 return tot # make numba version nbsumeq = autojit(pysumeq) A = (rand(10,10)<.5) B = (rand(10,10)<.5) # do a simple dry run to get it to compile # for this specific use case nbsumeq(A, B) 

If you don't have numba, I would suggest using @ user2357112 answer

Edit: just got a version of Cython, here is the .pyx file. I would go with that.

 from numpy cimport ndarray as ar cimport numpy as np cimport cython @cython.boundscheck(False) @cython.wraparound(False) def cysumeq(ar[np.uint8_t,ndim=2,cast=True] a, ar[np.uint8_t,ndim=2,cast=True] b): cdef int i, j, h=a.shape[0], w=a.shape[1], tot=0 for i in xrange(h): for j in xrange(w): if a[i,j] == b[i,j]: tot += 1 return tot 
+2
source

To get started, you can skip step A * B:

 >>> a array([ True, False, True, False, True], dtype=bool) >>> b array([False, True, True, False, True], dtype=bool) >>> np.sum(~(a^b)) 3 

If you are not opposed to destroying an array of a or b, I'm not sure what you will get faster than this:

 >>> a^=b #In place xor operator >>> np.sum(~a) 3 
+1
source

If the problem is allocation and deallocation, maintain one output array and tell numpy to output the results each time:

 out = np.empty_like(a) # Allocate this outside a loop and use it every iteration num_eq = np.equal(a, b, out).sum() 

This will only work if the inputs are always the same size. You can make one large array and cut off the part the size of which you need for each call if the inputs have different sizes, but I'm not sure how much this slows you down.

+1
source

Improving the response on IanH, you can also access the base C array in the numpy array from within Cython by providing mode="c" in ndarray.

 from numpy cimport ndarray as ar cimport numpy as np cimport cython @cython.boundscheck(False) @cython.wraparound(False) cdef int cy_sum_eq(ar[np.uint8_t,ndim=2,cast=True,mode="c"] a, ar[np.uint8_t,ndim=2,cast=True,mode="c"] b): cdef int i, j, h=a.shape[0], w=a.shape[1], tot=0 cdef np.uint8_t* adata = &a[0, 0] cdef np.uint8_t* bdata = &b[0, 0] for i in xrange(h): for j in xrange(w): if adata[j] == bdata[j]: tot += 1 adata += w bdata += w return tot 

This is about 40% faster on my machine than the IanH Cython version, and I found that reordering the contents of the loop does not seem to matter much at this point, probably due to compiler optimization. At this point, one could associate with a C function optimized using SSE, and thus perform this operation and pass adata and bdata as uint8_t* s

0
source

Source: https://habr.com/ru/post/1494407/


All Articles