Numpy how to get sub-matrix with boolean cut

My question is: how to get a submatrix, such as an auxiliary array, using boolean slices?

For instance:

a2 = np.array(np.arange(30).reshape(5, 6)) a2[a2[:, 1] > 10] 

will give to me:

  array([[12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29]]) 

a

  m2 = np.mat(np.arange(30).reshape(5, 6)) m2[m2[:, 1] > 10] 

will give to me:

  matrix([[12, 18, 24]]) 

Why is the result different and how can I get the same result as an array from a matrix?

Thanks!

+5
source share
2 answers

The problem you are facing is that matrix return operations always return a 2-dimensional array.

When you create a mask in the first array, you get:

 In [24]: a2[:,1] > 10 Out[24]: array([False, False, True, True, True], dtype=bool) 

which, as you can see, is a one-dimensional array.

When you do the same with the matrix, you get:

 In [25]: m2[:,1] > 10 Out[25]: matrix([[False], [False], [ True], [ True], [ True]], dtype=bool) 

In other words, you have an array of nx1, not an array of length n.


Indexing in numpy works differently depending on whether you index one or n dimensional array.

In your first case, numpy will process an array of length n as row indices, so you get the expected result:

 In [28]: a2[a2[:,1] > 10] Out[28]: array([[12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29]]) 

In the second case, since you have a two-dimensional index array, numpy has enough information to retrieve both the row and column, and therefore it only grabs things from the corresponding column (first):

 In [29]: m2[m2[:,1] > 10] Out[29]: matrix([[12, 18, 24]]) 

To answer your question: you can get this behavior by converting your masks into an array and grabbing the first column to extract an initial array of length n:

 In [32]: m2[np.array(m2[:,1] > 10)[:,0]] Out[32]: matrix([[12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29]]) 

Alternatively, you can first perform the conversion, getting the same result as before:

 In [34]: np.array(m2)[:,1] > 10 Out[34]: array([False, False, True, True, True], dtype=bool) 

Now both of these fixes require conversions between matrices and arrays, which can be pretty ugly.

The question I asked myself is why you want to use a matrix and expect array behavior. Perhaps the right tool for your work is an array, not a matrix.

+4
source

If you smooth the boolean mask as follows:

 m2[np.asarray(m2[:,1]>10).flatten()] 

you will get the same result, but I would recommend using np.array instead of np.matrix for the reasons mentioned in this answer .

+1
source

Source: https://habr.com/ru/post/1202824/


All Articles