I have a data frame that gives paired correlations of scores of people in the same state. I give a small example of what I want to do with this data, but right now my actual dataset contains 15 million rows for pair correlations and many more extra columns.
The following are sample data:
>sample_data
Pair_1ID Pair_2ID CORR
1 2 0.12
1 3 0.23
2 1 0.12
2 3 0.75
3 1 0.23
3 2 0.75
I want to create a new data frame without duplicates, for example, in line 1, the ratio between people 1 and 2 is 0.12. Line 1 is the same as line 3, which shows the correlation between 2 and 1. Since they have the same information, I would like to get the final file without duplicates, I would like the file to be as follows:
>output
Pair_1ID Pair_2ID CORR
1 2 0.12
1 3 0.23
2 3 0.75
Can anyone help? A unique team does not work with this, and I do not know how to do it.