How to delete unique lines in pandas framework?

I am stuck in a seemingly easy problem: discarding unique rows in a pandas dataframe. Basically, the opposite of drop_duplicates() .

Let's say this is my data:

  ABC 0 foo 0 A 1 foo 1 A 2 foo 1 B 3 bar 1 A 

I would like to delete rows when A and B are unique, that is, I would like to save only rows 1 and 2.

I tried the following:

 # Load Dataframe df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]}) uniques = df[['A', 'B']].drop_duplicates() duplicates = df[~df.index.isin(uniques.index)] 

But I only get line 2, since 0, 1 and 3 are in uniques!

+5
source share
2 answers

Solutions for selecting all duplicate rows:

You can use duplicated with a subset and the parameter keep=False to select all duplicates:

 df = df[df.duplicated(subset=['A','B'], keep=False)] print (df) ABC 1 foo 1 A 2 foo 1 B 

Solution with transform :

 df = df[df.groupby(['A', 'B'])['A'].transform('size') > 1] print (df) ABC 1 foo 1 A 2 foo 1 B 

Slightly modified solutions for selecting all unique rows:

 #invert boolean mask by ~ df = df[~df.duplicated(subset=['A','B'], keep=False)] print (df) ABC 0 foo 0 A 3 bar 1 A df = df[df.groupby(['A', 'B'])['A'].transform('size') == 1] print (df) ABC 0 foo 0 A 3 bar 1 A 
+5
source

I came up with a solution using groupby :

 groupped = df.groupby(['A', 'B']).size().reset_index().rename(columns={0: 'count'}) uniques = groupped[groupped['count'] == 1] duplicates = df[~df.index.isin(uniques.index)] 

Duplicates now have the correct result:

  ABC 2 foo 1 B 3 bar 1 A 

Also, my initial attempt at the question can be fixed by simply adding keep=False to the drop_duplicates method:

 # Load Dataframe df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]}) uniques = df[['A', 'B']].drop_duplicates(keep=False) duplicates = df[~df.index.isin(uniques.index)] 

Please answer @jezrael, I think this is safer (?) Since I use pandas indices here.

0
source

Source: https://habr.com/ru/post/1269450/


All Articles