Delete duplicates in Pandas excluding one column

It seems simple, but I can not find information about it on the Internet.

I have a dataframe as below

City    State Zip           Date        Description       
Earlham IA    50072-1036    2014-10-10  Postmarket Assurance: Devices
Earlham IA    50072-1036    2014-10-10  Compliance: Devices
Madrid  IA    50156-1748    2014-09-10  Drug Quality Assurance

How can I remove duplicates that match 4 of 5 columns? The column is not appropriate Description.

Result will be

City    State Zip           Date        Description       
Earlham IA    50072-1036    2014-10-10  Postmarket Assurance: Devices
Madrid  IA    50156-1748    2014-09-10  Drug Quality Assurance

I found online that it could work drop_dupilcateswith a parameter subset, but I'm not sure how I can apply it to multiple columns.

+4
source share
1 answer

In fact, you have found a solution. For multiple columns, the subset will be a list.

df.drop_duplicates(subset=['City', 'State', 'Zip', 'Date']) 

Or simply by specifying a column to ignore:

df.drop_duplicates(df.columns.difference(['Description']))
+9
source

Source: https://habr.com/ru/post/1648295/


All Articles