How to remove duplicates from a subset of rows in pandas framework?

Question

How to remove duplicates from a subset of rows in pandas framework?

I have a dataframe like this:

A   B       C
12  true    1
12  true    1
3   nan     2
3   nan     3

I would like to delete all the rows where the value of column A will be repeated, but only if the value of column B is "true".

The resulting information frame, which I mean, is:

A   B       C
12  true    1
3   nan     2
3   nan     3

I tried using: df.loc[df['B']=='true'].drop_duplicates('A', inplace=True, keep='first')but it does not work.

Thank you for your help!

+4

python pandas

Tatsuya Feb 22 '18 at 19:04

source share

2 answers

df[df.B.ne(True) | ~df.A.duplicated()]

    A     B  C
0  12  True  1
2   3   NaN  2
3   3   NaN  3

+4

piRSquared Feb 22 '18 at 19:41

source share

Wen · Accepted Answer · 2018-02-22T19:07:54+0000

You can sue pd.concatdivide df by B

df=pd.concat([df.loc[df.B!=True],df.loc[df.B==True].drop_duplicates(['A'],keep='first')]).sort_index()
df

Out[1593]: 
    A     B  C
0  12  True  1
2   3   NaN  2
3   3   NaN  3

How to remove duplicates from a subset of rows in pandas framework?

More articles: