Pandas group on one column with maximum date on another python column

I have a data block with the following data:

invoice_no  dealer  billing_change_previous_month        date
       110       1                              0  2016-12-31
       100       1                         -41981  2017-01-30
      5505       2                              0  2017-01-30
      5635       2                          58730  2016-12-31

I want to have only one dealer with a maximum date. The desired result should be like this:

invoice_no  dealer  billing_change_previous_month        date
       100       1                         -41981  2017-01-30
      5505       2                              0  2017-01-30

Each dealer must have a maximum date, in advance for your help.

+6
source share
3 answers

You can use boolean indexing with groupby and convert

df_new = df[df.groupby('dealer').date.transform('max') == df['date']]

    invoice_no  dealer  billing_change_previous_month   date
1   100         1       -41981                          2017-01-30
2   5505        2       0                               2017-01-30
+3
source

Tack 1

drop_duplicates. , Tack 2 , . .

df.sort_values(['dealer', 'date'], inplace=True)
df.drop_duplicates(['dealer', 'date'], inplace=True)

Tack 2

groupby merge. groupby, . how='inner', , groupby, .

, . , drop_duplicates .

df.merge(df.groupby('dealer')['date'].max().reset_index(), 
                             on=['dealer', 'date'], how='inner')

   invoice_no  dealer  billing_change_previous_month        date
0         100       1                         -41981  2017-01-30
1        5505       2                              0  2017-01-30
+1

Here is fooobar.com/questions/843973 / ... more correct solution:

df.sort_values('date', ascending=False).groupby('dealer').tail(1)
0
source

Source: https://habr.com/ru/post/1693550/


All Articles