Remove duplicate rows from Pandas dataframe where only some columns have the same value

I have a pandas dataframe as follows:

A   B   C
1   2   x
1   2   y
3   4   z
3   5   x

I want only 1 row of rows left that have the same values ​​in certain columns. In the above example, I mean columns A and B. In other words, if the values ​​of columns A and B occur more than once in the data frame, only one row remains (which does not matter).

FWIW: The maximum number of so-called repeating rows (that is, where the same columns are A and B) is 2.

The result should look like this:

A   B   C
1   2   x
3   4   z
3   5   x

or

A   B   C
1   2   y
3   4   z
3   5   x
+4
source share
1 answer

drop_duplicates subset, keep='last':

df1 = df.drop_duplicates(subset=['A','B'])
#same as
#df1 = df.drop_duplicates(subset=['A','B'], keep='first')
print (df1)
   A  B  C
0  1  2  x
2  3  4  z
3  3  5  x

df2 = df.drop_duplicates(subset=['A','B'], keep='last')
print (df2)
   A  B  C
1  1  2  y
2  3  4  z
3  3  5  x
+8

Source: https://habr.com/ru/post/1678978/


All Articles