How to remove duplicate from DataFrame taking into account value of another column

Question

How to remove duplicate from DataFrame taking into account value of another column

When I drop John as a duplicate specifying "name" as the column name:

 import pandas as pd data = {'name':['Bill','Steve','John','John','John'], 'age':[21,28,22,30,29]} df = pd.DataFrame(data) df = df.drop_duplicates('name')

pandas removes all matching objects, leaving the leftmost:

  age name 0 21 Bill 1 28 Steve 2 22 John

Instead, I would like to keep the line where John is the highest age (in this example, this is age 30. How to achieve this?

+5

python pandas dataframe

alphanumeric Oct 16 '16 at 23:15

source share

1 answer

Maxu · Accepted Answer · 2016-10-16T23:19:54+0000

Try the following:

 In [75]: df Out[75]: age name 0 21 Bill 1 28 Steve 2 22 John 3 30 John 4 29 John In [76]: df.sort_values('age').drop_duplicates('name', keep='last') Out[76]: age name 0 21 Bill 1 28 Steve 3 30 John

or it depends on your goals:

 In [77]: df.drop_duplicates('name', keep='last') Out[77]: age name 0 21 Bill 1 28 Steve 4 29 John

How to remove duplicate from DataFrame taking into account value of another column

More articles: