How to select duplicate rows using pandas?

Question

How to select duplicate rows using pandas?

I have a dataframe like this:

import pandas as pd
dic = {'A':[100,200,250,300],
       'B':['ci','ci','po','pa'],
       'C':['s','t','p','w']}
df = pd.DataFrame(dic)

My goal is to separate a row in 2 data frames:

df1 = contains all rows that do not repeat values along a column B(unque rows).
df2 = contains only lines that repeat topics.

The result should look like this:

df1 =      A  B C         df2 =     A  B C
      0  250 po p               0  100 ci s 
      1  300 pa w               1  250 ci t

Note:

the data can be generally very large and have many values that are repeated in column B, so the answer should be as general as possible
- If there are no duplicates, df2 should be empty! all results should be in df1

+4

python pandas dataframe subtraction divide

Federico gentile Dec 08 '16 at 15:27

source share

1 answer

jezrael · Accepted Answer · 2016-12-08T15:29:36+0000

Series.duplicated keep=False, , boolean indexing, ~ mask:

mask = df.B.duplicated(keep=False)
print (mask)
0     True
1     True
2    False
3    False
Name: B, dtype: bool

print (df[mask])
     A   B  C
0  100  ci  s
1  200  ci  t

print (df[~mask])
     A   B  C
2  250  po  p
3  300  pa  w

How to select duplicate rows using pandas?

More articles: