Vectorization for performance

I want to avoid using the for loop in the following code to achieve performance. Is vectorization suitable for this kind of problem?

a = np.array([[0,1,2,3,4], [5,6,7,8,9], [0,1,2,3,4], [5,6,7,8,9], [0,1,2,3,4]],dtype= np.float32) temp_a = np.copy(a) for i in range(1,a.shape[0]-1): for j in range(1,a.shape[1]-1): if a[i,j] > 3: temp_a[i+1,j] += a[i,j] / 5. temp_a[i-1,j] += a[i,j] / 5. temp_a[i,j+1] += a[i,j] / 5. temp_a[i,j-1] += a[i,j] / 5. temp_a[i,j] -= a[i,j] * 4. / 5. a = np.copy(temp_a) 
+5
source share
3 answers

Basically you are convolving with a special mode for borders.

Try the following:

 from scipy.signal import convolve2d # define your filter f = np.array([[0.0, 0.2, 0.0], [0.2,-0.8, 0.2], [0.0, 0.2, 0.0]]) # select parts of 'a' to be used for convolution b = (a * (a > 3))[1:-1, 1:-1] # convolve, padding with zeros ('same' mode) c = convolve2d(b, f, mode='same') # add the convolved result to 'a', excluding borders a[1:-1, 1:-1] += c # treat the special cases of the borders a[0, 1:-1] += .2 * b[0, :] a[-1, 1:-1] += .2 * b[-1, :] a[1:-1, 0] += .2 * b[:, 0] a[1:-1, -1] += .2 * b[:, -1] 

It gives the following result, which matches your nested loops.

 [[ 0. 2.2 3.4 4.6 4. ] [ 6.2 2.6 4.2 3. 10.6] [ 0. 3.4 4.8 6.2 4. ] [ 6.2 2.6 4.2 3. 10.6] [ 0. 2.2 3.4 4.6 4. ]] 
+2
source

My path uses 3 filters, rot90 , np.where , np.sum and np.multiply . I am not sure which way to the standard is more reasonable. If you do not take into account the creation time of the filters, it is about 4 times faster.

 # Each filter basically does what `op` tries to achieve in a loop filter1 = np.array([[0, 1 ,0, 0, 0], [1, -4, 1, 0, 0], [0, 1, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]) /5. filter2 = np.array([[0, 0 ,1, 0, 0], [0, 1, -4, 1, 0], [0, 0, 1, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]) /5. filter3 = np.array([[0, 0 ,0, 0, 0], [0, 0, 1, 0, 0], [0, 1, -4, 1, 0], [0, 0, 1, 0, 0], [0, 0, 0, 0, 0]]) /5. # only loop over the center of the matrix, a center = np.array([[0, 0, 0, 0, 0], [0, 1, 1, 1, 0], [0, 1, 1, 1, 0], [0, 1, 1, 1, 0], [0, 0, 0, 0, 0]]) 

filter1 and filter2 can be rotated to represent 4 filters individually.

 filter1_90_rot = np.rot90(filter1, k=1) filter1_180_rot = np.rot90(filter1, k=2) filter1_270_rot = np.rot90(filter1, k=3) filter2_90_rot = np.rot90(filter2, k=1) filter2_180_rot = np.rot90(filter2, k=2) filter2_270_rot = np.rot90(filter2, k=3) # Based on different index from `a` return different filter filter_dict = { (1,1): filter1, (3,1): filter1_90_rot, (3,3): filter1_180_rot, (1,3): filter1_270_rot, (1,2): filter2, (2,1): filter2_90_rot, (3,2): filter2_180_rot, (2,3): filter2_270_rot, (2,2): filter3 } 

Main function

 def get_new_a(a): x, y = np.where(((a > 3) * center) > 0) # find pairs that match the condition return a + np.sum(np.multiply(filter_dict[i, j], a[i,j]) for (i, j) in zip(x,y)) 

Note. There seems to be some numerical errors, so np.equal() will basically return False between my result and OP, while np.close() will return true.

Sync Results

 def op(): temp_a = np.copy(a) for i in range(1,a.shape[0]-1): for j in range(1,a.shape[1]-1): if a[i,j] > 3: temp_a[i+1,j] += a[i,j] / 5. temp_a[i-1,j] += a[i,j] / 5. temp_a[i,j+1] += a[i,j] / 5. temp_a[i,j-1] += a[i,j] / 5. temp_a[i,j] -= a[i,j] * 4. / 5. a2 = np.copy(temp_a) %timeit op() 167 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) %timeit get_new_a(a): 37.2 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 

Remember again, we ignore the filter creation time, as I think it will be one time. If you want to enable time for creating filters , it's about twice as fast. You might think this is unfair because the op method contains two np.copy . The disadvantage of the op method, I think, is the for loop.

Reference:

numpy.multiply performs elemental multiplication between two matrices.
np.rot90 does the rotation for us. k is a parameter that you can decide how many times to rotate. np.isclose can use this function to check if two matrices are closed inside some error that you can define.

+1
source

I came up with this solution:

 a = np.array([[0,0,0,0,0], [0,6,2,8,0], [0,1,5,3,0], [0,6,7,8,0], [0,0,0,0,0]],dtype= np.float32) up= np.zeros_like(a) down= np.zeros_like(a) right= np.zeros_like(a) left = np.zeros_like(a) def new_a(a,u,r,d,l): c = np.copy(a) c[c <= 3] = 0 up[:-2, 1:-1] += c[1:-1,1:-1] / 5. down[2:, 1:-1] += c[1:-1,1:-1] / 5. left[1:-1, :-2] += c[1:-1,1:-1]/ 5. right[1:-1, 2:] += c[1:-1,1:-1] / 5. a[1:-1,1:-1] -= c[1:-1,1:-1] * 4. / 5. a += up + down + left + right return a 
+1
source

Source: https://habr.com/ru/post/1274436/


All Articles