Bean Elements in a String - Vectorized 2D Bincount for NumPy

I have a NumPy array with integer values. Matrix values ​​range from 0 to the maximum element in the matrix (in other words, all numbers from 0 to the maximum data element presented in it). I need to build efficient (effective means of a quick, fully vectorized solution) to find the number of elements in each row and encode them according to the matrix values.

I could not find a similar question or a question that somehow helped solve this problem.

So, if I have this data in the input:

 # shape is (N0=4, m0=4) 1 1 0 4 2 4 2 1 1 2 3 5 4 4 4 1 

desired result:

 # shape(N=N0, m=data.max()+1): 1 2 0 0 1 0 0 1 2 0 1 0 0 1 1 1 0 1 0 1 0 0 3 0 

I know how to solve this by simply counting the unique values ​​in each row of the iteration data one by one, and then combining the results, taking into account all the possible values ​​in the data array.

When using NumPy for vectorization, this key problem is that finding each number one by one is slow and assuming that there are many unique numbers, this may not be an effective solution. As a rule, both N and the number of unique numbers are quite large (by the way, N seems to be larger than the number of unique numbers).

Does anyone have any great ideas?)

+5
source share
1 answer

Well, in principle, np.bincount works with 1D arrays. But we must use it on each line iteratively (thinking about it is simple). To make it vectorized, we could compensate each row with this maximum number. The idea is to have different bins for each row so that they are not affected by other elements of the row with the same numbers.

Therefore, the implementation will be -

 # Vectorized solution def bincount2D_vectorized(a): N = a.max()+1 a_offs = a + np.arange(a.shape[0])[:,None]*N return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N) 

Run Example -

 In [189]: a Out[189]: array([[1, 1, 0, 4], [2, 4, 2, 1], [1, 2, 3, 5], [4, 4, 4, 1]]) In [190]: bincount2D_vectorized(a) Out[190]: array([[1, 2, 0, 0, 1, 0], [0, 1, 2, 0, 1, 0], [0, 1, 1, 1, 0, 1], [0, 1, 0, 0, 3, 0]]) 

Numba tweaks

We can introduce numba to speed numba up. Now numba allows several settings.

  • First, it allows JIT compilation.

  • In addition, they recently introduced an experimental parallel , which automatically parallelizes operations in a function known as parallel semantics.

  • The final setup will be to use prange as a substitution for range . The docs claim this works parallel to loops, similar to OpenMP parallel for loops and Cythons prange. prange works well with larger datasets, which is probably due to the overhead required to set up parallel operation.

So, with these two new settings, along with njit for non-Python mode, we will have three options -

 # Numba solutions def bincount2D_numba(a, use_parallel=False, use_prange=False): N = a.max()+1 m,n = a.shape out = np.zeros((m,N),dtype=int) # Choose fucntion based on args func = bincount2D_numba_func0 if use_parallel: if use_prange: func = bincount2D_numba_func2 else: func = bincount2D_numba_func1 # Run chosen function on input data and output func(a, out, m, n) return out @njit def bincount2D_numba_func0(a, out, m, n): for i in range(m): for j in range(n): out[i,a[i,j]] += 1 @njit(parallel=True) def bincount2D_numba_func1(a, out, m, n): for i in range(m): for j in range(n): out[i,a[i,j]] += 1 @njit(parallel=True) def bincount2D_numba_func2(a, out, m, n): for i in prange(m): for j in prange(n): out[i,a[i,j]] += 1 

For completeness and testing later, the loopback version will be -

 # Loopy solution def bincount2D_loopy(a): N = a.max()+1 m,n = a.shape out = np.zeros((m,N),dtype=int) for i in range(m): out[i] = np.bincount(a[i], minlength=N) return out 

Runtime test

Case No. 1:

 In [312]: a = np.random.randint(0,100,(100,100)) In [313]: %timeit bincount2D_loopy(a) ...: %timeit bincount2D_vectorized(a) ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True) 10000 loops, best of 3: 115 µs per loop 10000 loops, best of 3: 36.7 µs per loop 10000 loops, best of 3: 22.6 µs per loop 10000 loops, best of 3: 22.7 µs per loop 10000 loops, best of 3: 39.9 µs per loop 

Case No. 2:

 In [316]: a = np.random.randint(0,100,(1000,1000)) In [317]: %timeit bincount2D_loopy(a) ...: %timeit bincount2D_vectorized(a) ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True) 100 loops, best of 3: 2.97 ms per loop 100 loops, best of 3: 3.54 ms per loop 1000 loops, best of 3: 1.83 ms per loop 100 loops, best of 3: 1.78 ms per loop 1000 loops, best of 3: 1.4 ms per loop 

Case No. 3:

 In [318]: a = np.random.randint(0,1000,(1000,1000)) In [319]: %timeit bincount2D_loopy(a) ...: %timeit bincount2D_vectorized(a) ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True) 100 loops, best of 3: 4.01 ms per loop 100 loops, best of 3: 4.86 ms per loop 100 loops, best of 3: 3.21 ms per loop 100 loops, best of 3: 3.18 ms per loop 100 loops, best of 3: 2.45 ms per loop 

numba options numba work very well. The choice of one of the three options will depend on the shape parameters of the input array and to some extent on the number of unique elements in it.

+7
source

Source: https://habr.com/ru/post/1271877/


All Articles