Selecting unique observations in a pandas data frame

I have a pandas data frame with a uniqueid column. I would like to remove all duplicates from the data frame based on this column so that all other observations are unique.

+6
source share
2 answers

There is also a drop_duplicates() method for any data frame ( docs here ). You can pass specific columns to reject arguments.

 df.drop_duplicates(subset='uniqueid', inplace=True) 
+10
source

Use the duplicated method

Since we don't care if uniqueid ( A in my example) is duplicated, select this and call duplicated in this series. Then use ~ to flip bools.

 In [90]: df = pd.DataFrame({'A': ['a', 'b', 'b', 'c'], 'B': [1, 2, 3, 4]}) In [91]: df Out[91]: AB 0 a 1 1 b 2 2 b 3 3 c 4 In [92]: df['A'].duplicated() Out[92]: 0 False 1 False 2 True 3 False Name: A, dtype: bool In [93]: df.loc[~df['A'].duplicated()] Out[93]: AB 0 a 1 1 b 2 3 c 4 
+9
source

Source: https://habr.com/ru/post/957155/


All Articles