Extract unique combinational strings from a data frame in R

I have a data frame that gives paired correlations of scores of people in the same state. I give a small example of what I want to do with this data, but right now my actual dataset contains 15 million rows for pair correlations and many more extra columns.

The following are sample data:

>sample_data

Pair_1ID    Pair_2ID    CORR    
1           2           0.12    
1           3           0.23    
2           1           0.12    
2           3           0.75    
3           1           0.23    
3           2           0.75    

I want to create a new data frame without duplicates, for example, in line 1, the ratio between people 1 and 2 is 0.12. Line 1 is the same as line 3, which shows the correlation between 2 and 1. Since they have the same information, I would like to get the final file without duplicates, I would like the file to be as follows:

>output 


Pair_1ID    Pair_2ID    CORR
    1        2          0.12
    1        3          0.23
    2        3          0.75

Can anyone help? A unique team does not work with this, and I do not know how to do it.

+2
2

, :

subset(sample_data , Pair_1ID <= Pair_2ID)

:

unique(transform(sample_data, Pair_1ID = pmin(Pair_1ID, Pair_2ID),
                              Pair_2ID = pmax(Pair_1ID, Pair_2ID)))

: , CORR unique, - . , , . :

relabeled <- transform(sample_data, Pair_1ID = pmin(Pair_1ID, Pair_2ID),
                                    Pair_2ID = pmax(Pair_1ID, Pair_2ID))
subset(relabeled, !duplicated(cbind(Pair_1ID, Pair_2ID)))
+9

. , . , , 1 , - ( ).

maxVal <- max(sample_data$Pair_1ID)
shrtIdx <- logical(maxVal)
idx <- sapply(seq(maxVal - 1, 1), function(x) replace(shrtIdx, seq(x), TRUE))
sample_data[idx,]

#   Pair_1ID Pair_2ID CORR
# 1        1        2 0.12
# 2        1        3 0.23
# 4        2        3 0.75
0

Source: https://habr.com/ru/post/1655408/


All Articles