Select specific rows (condition satisfied), but only some columns in Python / Numpy

I have a numpy array with 4 columns and you want to select columns 1, 3 and 4, where the value of the second column corresponds to a specific condition (i.e. a fixed value). I tried first to select only the rows, but with all four columns:

I = A[A[:,1] == i] 

which is working. Then I tried (similar to Matlab, which I know very well):

 I = A[A[:,1] == i, [0,2,3]] 

which does not work. How to do it?


DATA EXAMPLE:

  >>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]]) >>> print A [[1 2 3 4] [6 1 3 4] [3 2 5 6]] >>> i = 2 # I want to get the columns 1, 3 and 4 for every row which has the value i in the second column. In this case, this would be row 1 and 3 with columns 1, 3 and 4: [[1 3 4] [3 5 6]] 

Now I use this:

 I = A[A[:,1] == i] I = I[:, [0,2,3]] 

But I thought there should be a better way to do this ... (I used MATLAB)

+6
source share
5 answers
 >>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]]) >>> a array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]]) >>> a[a[:,0] > 3] # select rows where first column is greater than 3 array([[ 5, 6, 7, 8], [ 9, 10, 11, 12]]) >>> a[a[:,0] > 3][:,np.array([True, True, False, True])] # select columns array([[ 5, 6, 8], [ 9, 10, 12]]) # fancier equivalent of the previous >>> a[np.ix_(a[:,0] > 3, np.array([True, True, False, True]))] array([[ 5, 6, 8], [ 9, 10, 12]]) 

For an explanation of the obscure np.ix_() see fooobar.com/questions/969964 / ...

Finally, we can simplify by specifying a list of columns instead of a tedious boolean mask:

 >>> a[np.ix_(a[:,0] > 3, (0,1,3))] array([[ 5, 6, 8], [ 9, 10, 12]]) 
+9
source

If you do not want to use logical positions, but indexes, you can write this as follows:

 A[:, [0, 2, 3]][A[:, 1] == i] 

Returning to your example:

 >>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]]) >>> print A [[1 2 3 4] [6 1 3 4] [3 2 5 6]] >>> i = 2 >>> print A[:, [0, 2, 3]][A[:, 1] == i] [[1 3 4] [3 5 6]] 

Really,

+4
source
 >>> a=np.array([[1,2,3], [1,3,4], [2,2,5]]) >>> a[a[:,0]==1][:,[0,1]] array([[1, 2], [1, 3]]) >>> 
+2
source

This also works.

 I = np.array([row[[x for x in range(A.shape[1]) if x != i-1]] for row in A if row[i-1] == i]) print I 

Edit: since indexing starts at 0, so

 i-1 

.

+1
source

I hope this answers your question, but the piece of script that I used with pandas:

 df_targetrows = df.loc[df[col2filter]*somecondition*, [col1,col2,...,coln]] 

For instance,

 targets = stockdf.loc[stockdf['rtns'] > .04, ['symbol','date','rtns']] 

this will return a dataframe with only the columns ['symbol','date','rtns'] from stockdf , where the rtns string rtns matches, stockdf['rtns'] > .04

hope this helps

+1
source

Source: https://habr.com/ru/post/969963/


All Articles