Select pandas frame rows based on two column values

Question

Select pandas frame rows based on two column values

I want to select some specific rows based on two column values. For instance:

d = {'user' : [1., 2., 3., 4] ,'item' : [5., 6., 7., 8.],'f1' : [9., 16., 17., 18.], 'f2':[4,5,6,5], 'f3':[4,5,5,8]} df = pd.DataFrame(d) print df Out: f1 f2 f3 item user 0 9 4 4 5 1 1 16 5 5 6 2 2 17 6 5 7 3 3 18 5 8 8 4

I want to select strings based on the values of "user" and "item". Given a 2d numpy array that stores [user, item] value pairs:

 samples = np.array([[1,5],[3,7],[3,7],[2,6]]) Out: array([[1, 5], [3, 7], [3, 7], [2, 6]])

Then the expected result:

  Out: f1 f2 f3 item user 0 9 4 4 5 1 2 17 6 5 7 3 2 17 6 5 7 3 1 16 5 5 6 2

Then my final goal is to get a 2d numpy array that stores all the column values except for the element and user, which:

 Out: array([[9, 4, 4], [17, 6, 5], [17, 6, 5], [16, 5, 5]])

As we can see, these are the values of the columns f1, f2, f3.

How can i do this?

+6

python arrays numpy pandas dataframe

Excalibur Jun 01 '15 at 20:11

source share

2 answers

One approach that is slightly tilted by the numpy array is

 import numpy as np # Convert item and user columns to a 2-column array item_user_arr = np.asarray(df[["item","user"]]).astype(int) # Mask of matches across rows of samples and item_user_arr, with columns flipped mask = (samples[:,None,1]==item_user_arr[:,0]) & (samples[:,None,0]==item_user_arr[:,1]) # Get indices of matches _,C = np.where(mask) # Use those indices to select data from f1,f2,f3 columns for final output array out = np.asarray(df[["f1","f2","f3"]])[C,:]

The output for these inputs is

 In [536]: out Out[536]: array([[ 9., 4., 4.], [ 17., 6., 5.], [ 17., 6., 5.], [ 16., 5., 5.]])

+1

Divakar Jun 01 '15 at 20:41

source share

unutbu · Accepted Answer · 2015-06-01T20:31:45+0000

If you create samples DataFrame with user and item columns, then you can get the desired values using an internal join . By default, pd.merge merged in all columns of samples and df shared - in this case it will be user and item . Hence,

 result = pd.merge(samples, df, how='inner')

gives

  user item f1 f2 f3 0 1 5 9 4 4 1 3 7 17 6 5 2 3 7 17 6 5 3 2 6 16 5 5

 import numpy as np import pandas as pd d = {'user' : [1., 2., 3., 4] ,'item' : [5., 6., 7., 8.],'f1' : [9., 16., 17., 18.], 'f2':[4,5,6,5], 'f3':[4,5,5,8]} df = pd.DataFrame(d) samples = np.array([[1,5],[3,7],[3,7],[2,6]]) samples = pd.DataFrame(samples, columns=['user', 'item']) result = pd.merge(samples, df, how='inner') result = result[['f1', 'f2', 'f3']] result = result.values print(result)

gives

 [[ 9. 4. 4.] [ 17. 6. 5.] [ 17. 6. 5.] [ 16. 5. 5.]]

Select pandas frame rows based on two column values

More articles: