Selecting unique observations in a pandas data frame

Question

Selecting unique observations in a pandas data frame

I have a pandas data frame with a uniqueid column. I would like to remove all duplicates from the data frame based on this column so that all other observations are unique.

+6

python pandas

Michael Oct 31 '13 at 23:43

source share

2 answers

Use the duplicated method

Since we don't care if uniqueid ( A in my example) is duplicated, select this and call duplicated in this series. Then use ~ to flip bools.

 In [90]: df = pd.DataFrame({'A': ['a', 'b', 'b', 'c'], 'B': [1, 2, 3, 4]}) In [91]: df Out[91]: AB 0 a 1 1 b 2 2 b 3 3 c 4 In [92]: df['A'].duplicated() Out[92]: 0 False 1 False 2 True 3 False Name: A, dtype: bool In [93]: df.loc[~df['A'].duplicated()] Out[93]: AB 0 a 1 1 b 2 3 c 4

+9

Tomugspurger Nov 01 '13 at 1:35

source share

cwharland · Accepted Answer · 2013-11-01T04:13:27+0000

There is also a drop_duplicates() method for any data frame ( docs here ). You can pass specific columns to reject arguments.

 df.drop_duplicates(subset='uniqueid', inplace=True)

Selecting unique observations in a pandas data frame

More articles: