R - group job consistency check, group labels with different names

I am trying to assign membership to subgroups in 4 independent cancer gene expression data sets, training in each data set, in turn, followed by testing (based on the metagen) of the destination in the other three, as well as testing the training cohort itself.

This leads to group membership for each sample, for each comparison, and you can get an idea of ​​the stability of the sample (is this cluster sample set in one cluster each time?) The problem is that the group labels may differ from comparison to comparison, therefore comparison with group tags does not work.

To assess the stability of the sample, I think that I will need to catalog my subgroup members for each sample, but I could not understand how exactly I should do this.

What is its value, the code below should demonstrate the problem a little more clearly than I described above.

Thanks for reading and any help is appreciated!

## Here we have 12 samples (AL), all of which have congruent assignments, except sample K. ## From the two group assignments, we can see that group 1 has become group 4 in class2, ## group 2 has become group 1 etc. etc. ## How do we assess cluster membership with these differing subgroup labels? class1<-c(1,2,3,4,1,2,3,4,1,2,3,4) class2<-c(4,1,2,3,4,1,2,3,4,1,3,3) names(class1)<-LETTERS[1:12] names(class2)<-LETTERS[1:12] 
+4
source share
2 answers

Try matchClasses on e1071 , or some of the methods in the seriation package may help. However, you need a complete two-way classification table.

+4
source

Good question. Thank you so clearly formulated the question. I am currently working on clustering and have parked this issue for resolution later.

Here is a graphical way to solve the problem.

 library(ggplot2) # Create dummy data # In the first instance, there is perfect transposition between A and D d <- data.frame( clust1 = LETTERS[rep(1:4, 3)], clust2 = LETTERS[rep(c(4,1,2,3), 3)] ) ggplot(d, aes(x=clust1, y=clust2)) + geom_point(stat="sum", aes(size=..n..)) 

Perfect transposition - all bubbles same size

 # Now modify data so that there is a single instance of imperfect matching d$clust2[1] <- "A" ggplot(d, aes(x=clust1, y=clust2)) + geom_point(stat="sum", aes(size=..n..)) 

Imperfect transposition - bubbles different sizes

+3
source

Source: https://habr.com/ru/post/1343902/


All Articles