How can I vectorize the averaging of 2x2 sub-arrays of a numpy array?

I have a very large 2D numpy array that contains 2x2 subsets that I need to take on average. I am looking for a way to vectorize this operation. For example, for x:

# |- col 0 -| |- col 1 -| |- col 2 -| x = np.array( [[ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0], # row 0 [ 6.0, 7.0, 8.0, 9.0, 10.0, 11.0], # row 0 [12.0, 13.0, 14.0, 15.0, 16.0, 17.0], # row 1 [18.0, 19.0, 20.0, 21.0, 22.0, 23.0]]) # row 1 

I need to get a 2x3 array, which are the average values โ€‹โ€‹of each 2x2 array, i.e.:

 result = np.array( [[ 3.5, 5.5, 7.5], [15.5, 17.5, 19.5]]) 

therefore, the element [0,0] is calculated as the average value of x [0: 2,0: 2], and the element [0,1] will be the average value of x [2: 4, 0: 2]. Does numpy have vectorized / efficient ways to execute aggregates on such subsets?

+5
source share
1 answer

If we form the transformed matrix y = x.reshape(2,2,3,2) , then the submatrix (i, j) 2x2 is given by the expression y[i,:,j,:] . For instance:.

 In [340]: x Out[340]: array([[ 0., 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10., 11.], [ 12., 13., 14., 15., 16., 17.], [ 18., 19., 20., 21., 22., 23.]]) In [341]: y = x.reshape(2,2,3,2) In [342]: y[0,:,0,:] Out[342]: array([[ 0., 1.], [ 6., 7.]]) In [343]: y[1,:,2,:] Out[343]: array([[ 16., 17.], [ 22., 23.]]) 

To get the average value of the 2x2 submatrix, use the mean method, with axis=(1,3) :

 In [344]: y.mean(axis=(1,3)) Out[344]: array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5]]) 

If you are using an older version of numpy that does not support the use of a tuple for an axis, you can do:

 In [345]: y.mean(axis=1).mean(axis=-1) Out[345]: array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5]]) 

See the @dashesy link in the comment for more information on the trick remake.


To generalize this to a 2nd array with the form (m, n), where m and n are even, use

 y = x.reshape(x.shape[0]/2, 2, x.shape[1], 2) 

y can then be interpreted as an array of 2x2 arrays. The first and third index slots of a 4-dimensional array act as indexes that select one of the 2x2 blocks. To get the top left 2x2 block, use y[0, :, 0, :] ; to the block in the second and third column of blocks use y[1, :, 2, :] ; and in general, to access the (j, k) block, use y[j, :, k, :] .

To calculate the reduced array of average of these blocks, use the mean method, with axis=(1, 3) (i.e., the average of axes 1 and 3):

 avg = y.mean(axis=(1, 3)) 

Here is an example where x has the form (8, 10), so the array of 2x2 block average values โ€‹โ€‹has the form (4, 5):

 In [10]: np.random.seed(123) In [11]: x = np.random.randint(0, 4, size=(8, 10)) In [12]: x Out[12]: array([[2, 1, 2, 2, 0, 2, 2, 1, 3, 2], [3, 1, 2, 1, 0, 1, 2, 3, 1, 0], [2, 0, 3, 1, 3, 2, 1, 0, 0, 0], [0, 1, 3, 3, 2, 0, 3, 2, 0, 3], [0, 1, 0, 3, 1, 3, 0, 0, 0, 2], [1, 1, 2, 2, 3, 2, 1, 0, 0, 3], [2, 1, 0, 3, 2, 2, 2, 2, 1, 2], [0, 3, 3, 3, 1, 0, 2, 0, 2, 1]]) In [13]: y = x.reshape(x.shape[0]/2, 2, x.shape[1]/2, 2) 

Take a look at a couple of 2x2 blocks:

 In [14]: y[0, :, 0, :] Out[14]: array([[2, 1], [3, 1]]) In [15]: y[1, :, 2, :] Out[15]: array([[3, 2], [2, 0]]) 

Calculate the average values โ€‹โ€‹of the blocks:

 In [16]: avg = y.mean(axis=(1, 3)) In [17]: avg Out[17]: array([[ 1.75, 1.75, 0.75, 2. , 1.5 ], [ 0.75, 2.5 , 1.75, 1.5 , 0.75], [ 0.75, 1.75, 2.25, 0.25, 1.25], [ 1.5 , 2.25, 1.25, 1.5 , 1.5 ]]) 
+6
source

Source: https://habr.com/ru/post/1206663/


All Articles