Df.unique () on entire column based DataFrame

Question

Df.unique () on entire column based DataFrame

I have a DataFrame dffilled with rows and columns where there are duplicate identifiers:

Index   Id   Type
0       a1   A
1       a2   A
2       b1   B
3       b3   B
4       a1   A
...

When i use:

uniqueId = df["Id"].unique()

I get a list of unique identifiers.

How can I apply this filtering to the entire DataFrame to keep the structure, but duplicates (based on "Id") are removed?

+4

python python-3.x pandas duplicates dataframe

Johnandrews Apr 3 '17 at 12:19

source share

1 answer

jezrael · Accepted Answer · 2017-04-03T12:20:22+0000

It seems you need DataFrame.drop_duplicateswith a parameter subsetthat indicates where the test duplicates are:

#keep first duplicate value
df = df.drop_duplicates(subset=['Id'])
print (df)
       Id Type
Index         
0      a1    A
1      a2    A
2      b1    B
3      b3    B

#keep last duplicate value
df = df.drop_duplicates(subset=['Id'], keep='last')
print (df)
       Id Type
Index         
1      a2    A
2      b1    B
3      b3    B
4      a1    A

#remove all duplicate values
df = df.drop_duplicates(subset=['Id'], keep=False)
print (df)
       Id Type
Index         
1      a2    A
2      b1    B
3      b3    B

Df.unique () on entire column based DataFrame

More articles: