I have a data frame with two columns, A
and B
In this context, the order of A
and B
is unimportant; for example, I would consider (0,50)
and (50,0)
as duplicates. In pandas, what is an effective way to remove these duplicates from a data frame?
import pandas as pd # Initial data frame. data = pd.DataFrame({'A': [0, 10, 11, 21, 22, 35, 5, 50], 'B': [50, 22, 35, 5, 10, 11, 21, 0]}) data AB 0 0 50 1 10 22 2 11 35 3 21 5 4 22 10 5 35 11 6 5 21 7 50 0 # Desired output with "duplicates" removed. data2 = pd.DataFrame({'A': [0, 5, 10, 11], 'B': [50, 21, 22, 35]}) data2 AB 0 0 50 1 5 21 2 10 22 3 11 35
Ideally, the output will be sorted by column A
values.
source share