Delete duplicates in Pandas excluding one column

Question

Delete duplicates in Pandas excluding one column

It seems simple, but I can not find information about it on the Internet.

I have a dataframe as below

City    State Zip           Date        Description       
Earlham IA    50072-1036    2014-10-10  Postmarket Assurance: Devices
Earlham IA    50072-1036    2014-10-10  Compliance: Devices
Madrid  IA    50156-1748    2014-09-10  Drug Quality Assurance

How can I remove duplicates that match 4 of 5 columns? The column is not appropriate Description.

Result will be

City    State Zip           Date        Description       
Earlham IA    50072-1036    2014-10-10  Postmarket Assurance: Devices
Madrid  IA    50156-1748    2014-09-10  Drug Quality Assurance

I found online that it could work drop_dupilcateswith a parameter subset, but I'm not sure how I can apply it to multiple columns.

+4

python pandas

Jstuff Jul 18 '16 at 20:25

source share

1 answer

ayhan · Accepted Answer · 2016-07-18T20:29:00+0000

In fact, you have found a solution. For multiple columns, the subset will be a list.

df.drop_duplicates(subset=['City', 'State', 'Zip', 'Date'])

Or simply by specifying a column to ignore:

df.drop_duplicates(df.columns.difference(['Description']))

Delete duplicates in Pandas excluding one column

More articles: