I have a dataframe and you want to remove duplicate rows that have the same values, but in different columns:
df = pd.DataFrame(columns=['a','b','c','d'], index=['1','2','3']) df.loc['1'] = pd.Series({'a':'x','b':'y','c':'e','d':'f'}) df.loc['2'] = pd.Series({'a':'e','b':'f','c':'x','d':'y'}) df.loc['3'] = pd.Series({'a':'w','b':'v','c':'s','d':'t'}) df Out[8]: abcd 1 xyef 2 efxy 3 wvst
Rows [1], [2] have values ββ{x, y, e, f}, but they are located in the cross - that is, if you exchange columns c, d with a, b in row [2] you will have a duplicate. I want to leave these lines and save only one to get the final output:
df_new Out[20]: abcd 1 xyef 3 wvst
How can I do this effectively?
source share