I have a data frame of four columns
df=DataFrame({'order_id':[134,101,131,159,101,189,120,102,134,130,231,421,141,129,141,101],\
'user_id':[24,10,24,12,24,10,10,24,21,12,12,10,12,17,24,12],
'product_id':[1004,1041,1078,1001,1001,1074,1001,1019,1021,1004,1001,1010,1004,1004,1017,1004],
'sector':['a','a','b','d','c','a','c','a','c','a','b','c','a','b','a','a']})
order_id product_id sector user_id
120 1001 c 10
421 1010 c 10
101 1041 a 10
189 1074 a 10
159 1001 d 12
231 1001 b 12
130 1004 a 12
141 1004 a 12
101 1004 a 12
129 1004 b 17
134 1021 c 21
101 1001 c 24
134 1004 a 24
141 1017 a 24
102 1019 a 24
131 1078 b 24
For each product_id, I want to filter the dataframe by selecting the rows of each (product_id, user_id) pair that have a larger order_id value than the maximum order_id associated with the pair (product_id, user_id)
For example, for product_id 1001 the maximum order_id associated with user_id 10 is 120, the maximum order_id ssocited with user_id 12 is 231, and for user_id 24 the maximum order_id is 101, so for product_id 1001 I would return a DataFrame
df2=DataFrame({'order_id':[421,189,134,141,102,131],
'product_id':[1010, 1074,1004,1017,1019,1078],
'sector':['c','a','a','a','a','b'],
'user_id':[10,10,24,24,24,24]})
order_id product_id sector user_id
421 1010 c 10
189 1074 a 10
134 1004 a 24
141 1017 a 24
102 1019 a 24
131 1078 b 24
product_id 1004 , user_id 10, . user_id 12
order_id 141 1004. order_id, user_id 12, , .
user_id 17 , product_id 1004, product_id.
user_id 17. order_id. , user_id 24 order_id
product_id 1004 134. product_id 1017 order_id 141, .
product_id 1004,
order_id product_id sector user_id
141 1017 a 24
product_id
, user_id, order_id product_id,
df3=df.groupby(['user_id'])
for key, val in df3:
d=val.sort_values(['order_id','product_id'])
print d