Select pandas frame rows based on two column values

I want to select some specific rows based on two column values. For instance:

d = {'user' : [1., 2., 3., 4] ,'item' : [5., 6., 7., 8.],'f1' : [9., 16., 17., 18.], 'f2':[4,5,6,5], 'f3':[4,5,5,8]} df = pd.DataFrame(d) print df Out: f1 f2 f3 item user 0 9 4 4 5 1 1 16 5 5 6 2 2 17 6 5 7 3 3 18 5 8 8 4 

I want to select strings based on the values โ€‹โ€‹of "user" and "item". Given a 2d numpy array that stores [user, item] value pairs:

 samples = np.array([[1,5],[3,7],[3,7],[2,6]]) Out: array([[1, 5], [3, 7], [3, 7], [2, 6]]) 

Then the expected result:

  Out: f1 f2 f3 item user 0 9 4 4 5 1 2 17 6 5 7 3 2 17 6 5 7 3 1 16 5 5 6 2 

Then my final goal is to get a 2d numpy array that stores all the column values โ€‹โ€‹except for the element and user, which:

 Out: array([[9, 4, 4], [17, 6, 5], [17, 6, 5], [16, 5, 5]]) 

As we can see, these are the values โ€‹โ€‹of the columns f1, f2, f3.

How can i do this?

+6
source share
2 answers

If you create samples DataFrame with user and item columns, then you can get the desired values โ€‹โ€‹using an internal join . By default, pd.merge merged in all columns of samples and df shared - in this case it will be user and item . Hence,

 result = pd.merge(samples, df, how='inner') 

gives

  user item f1 f2 f3 0 1 5 9 4 4 1 3 7 17 6 5 2 3 7 17 6 5 3 2 6 16 5 5 

 import numpy as np import pandas as pd d = {'user' : [1., 2., 3., 4] ,'item' : [5., 6., 7., 8.],'f1' : [9., 16., 17., 18.], 'f2':[4,5,6,5], 'f3':[4,5,5,8]} df = pd.DataFrame(d) samples = np.array([[1,5],[3,7],[3,7],[2,6]]) samples = pd.DataFrame(samples, columns=['user', 'item']) result = pd.merge(samples, df, how='inner') result = result[['f1', 'f2', 'f3']] result = result.values print(result) 

gives

 [[ 9. 4. 4.] [ 17. 6. 5.] [ 17. 6. 5.] [ 16. 5. 5.]] 
+8
source

One approach that is slightly tilted by the numpy array is

 import numpy as np # Convert item and user columns to a 2-column array item_user_arr = np.asarray(df[["item","user"]]).astype(int) # Mask of matches across rows of samples and item_user_arr, with columns flipped mask = (samples[:,None,1]==item_user_arr[:,0]) & (samples[:,None,0]==item_user_arr[:,1]) # Get indices of matches _,C = np.where(mask) # Use those indices to select data from f1,f2,f3 columns for final output array out = np.asarray(df[["f1","f2","f3"]])[C,:] 

The output for these inputs is

 In [536]: out Out[536]: array([[ 9., 4., 4.], [ 17., 6., 5.], [ 17., 6., 5.], [ 16., 5., 5.]]) 
+1
source

Source: https://habr.com/ru/post/988345/


All Articles