Filter data by max element from group by pair

I have a data frame of four columns

df=DataFrame({'order_id':[134,101,131,159,101,189,120,102,134,130,231,421,141,129,141,101],\
          'user_id':[24,10,24,12,24,10,10,24,21,12,12,10,12,17,24,12],
          'product_id':[1004,1041,1078,1001,1001,1074,1001,1019,1021,1004,1001,1010,1004,1004,1017,1004],
         'sector':['a','a','b','d','c','a','c','a','c','a','b','c','a','b','a','a']})

order_id    product_id  sector  user_id
    120      1001          c     10
    421      1010          c     10
    101      1041          a     10
    189      1074          a     10
    159      1001          d     12
    231      1001          b     12
    130      1004          a     12
    141      1004          a     12
    101      1004          a     12
    129      1004          b     17
    134      1021          c     21
    101      1001          c     24
    134      1004          a     24
    141      1017          a     24
    102      1019          a     24
    131      1078          b     24

For each product_id, I want to filter the dataframe by selecting the rows of each (product_id, user_id) pair that have a larger order_id value than the maximum order_id associated with the pair (product_id, user_id)

For example, for product_id 1001 the maximum order_id associated with user_id 10 is 120, the maximum order_id ssocited with user_id 12 is 231, and for user_id 24 the maximum order_id is 101, so for product_id 1001 I would return a DataFrame

df2=DataFrame({'order_id':[421,189,134,141,102,131],
'product_id':[1010, 1074,1004,1017,1019,1078],
'sector':['c','a','a','a','a','b'],
'user_id':[10,10,24,24,24,24]})

order_id    product_id  sector  user_id
    421        1010       c         10
    189        1074       a         10
    134        1004       a         24
    141        1017       a         24
    102        1019       a         24
    131        1078       b         24

product_id 1004 , user_id 10, . user_id 12 order_id 141 1004. order_id, user_id 12, , . user_id 17 , product_id 1004, product_id. user_id 17. order_id. , user_id 24 order_id product_id 1004 134. product_id 1017 order_id 141, .

product_id 1004,

  order_id  product_id  sector  user_id
    141        1017       a        24

product_id

, user_id, order_id product_id,

df3=df.groupby(['user_id'])
for key, val in df3:
    d=val.sort_values(['order_id','product_id'])
    print d
+4
1

, , :

def get_dataframe_for_product_id(your_input_df, wanted_product_id):
    df2 = your_input_df.groupby(['user_id'])
    result = pd.DataFrame([],columns=your_input_df.columns)
    for key, val in df2:
        result = pd.concat([result, val[val.order_id > val[val.product_id == wanted_product_id].order_id.max()]])
    return result
0

Source: https://habr.com/ru/post/1683118/


All Articles