Find the highest value 5 less than 1, the lowest 5 values

I have a great result of the correlation matrix in R - at the moment there are about 30 elements correlated against each other, so the array has about 10,000 cells. I want to find the biggest 5 and smallest 5 results. How can i do this?

Here, what small part - the upper left corner - looks like this:

PL1 V3 V4 V5 PL1 1.00000000 0.19905701 -0.02994034 -0.1533846 V3 0.19905701 1.00000000 0.09036472 0.1306054 V4 -0.02994034 0.09036472 1.00000000 0.1848030 V5 -0.15338465 0.13060539 0.18480296 1.0000000 

The values ​​in the table are always between 1 and -1, and if that helps, being a correlation matrix, the upper half above the diagonal is a duplicate of the lower half below the diagonal.

I need the most positive 5 less than 1 and the most negative 5, including -1 if it exists.

Thanks in advance.

+4
source share
6 answers

You want to find the largest and smallest correlations, and you probably know not only what, but where these values ​​came from. It is easy.

 x<-matrix(runif(25),5,5) cor<-cor(x) l <- length(cor) l1 <- length(cor[cor<1]) #the actual high and low correlation indexes corHigh <- order(cor)[(l1-4):l1] corLow <- order(cor)[1:5] #(if you just want to view the correlations cor[corLow] or cor[corHigh] works fine) #isolate them in the matrix so you can see where they came from easily corHighView <- cor corHighView[!1:l %in% corHigh] <- NA corLowView <- cor corLowView[!1:l %in% corLow] <- NA #look at your matrix with your target correlations sticking out like a sore thumb corLowView corHighView 
+2
source

Here's another rough way to do this (no doubt there is a much simpler way), but it's not too difficult to wrap this in a function:

EDIT: Shortened the code.

  # Simulate correlation matrix (taken from Patrick answer) set.seed(1) n<-100 x<-matrix(runif(n^2),n,n) cor<-cor(x) # Set diagonal and one triangle to to 0: diag(cor) <- 0 cor[upper.tri(cor)] <- 0 # Get sorted values: sort <- sort(cor) # Create a dummy matrix and get lowest 5: min <- matrix(cor %in% sort[1:5] ,n,n) which(min,arr.ind=T) # Same for highest 5: max <- matrix(cor %in% sort[(n^2-5):(n^2)] ,n,n) which(max,arr.ind=T) 

Another option, as ulidtko said, is to create a schedule. You can try my package called qgraph , which can be used to visualize the correlation matrix as a network:

 library(qgraph) qgraph(cor(x),vsize=2,minimum=0.2,filetype="png") 

qgraph output in PNG format

+5
source

Interesting Sacha network diagram. Here it is with real data. I seem to have much stronger positive than negative correlations.

enter image description here

+2
source

kind of dirty:

 x<-matrix(runif(25),5,5) cor<-cor(x) max1<-max(cor) max2<-max(cor[cor!=max1]) max3<-max(cor[cor!=max1 & cor!=max2]) max4<-max(cor[cor!=max1& cor!=max2& cor!=max3]) max5<-max(cor[cor!=max1& cor!=max2& cor!=max3& cor!=max4]) max6<-max(cor[cor!=max1& cor!=max2& cor!=max3& cor!=max4& cor!=max5]) maxes<-c(max2,max3,max4,max5,max6) maxes matrix(cor %in% maxes,5,5) 
+1
source

How about a beautiful creamy patch? :)

 > m <- matrix(runif(100)*2-1, ncol=10) > colnames(m) <- rownames(m) <- paste("V", 1:10, sep="") > m V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V1 -0.40101571 -0.27049070 0.2414295 -0.1889384 0.6459941 -0.8851884 0.332284597 -0.431312791 0.3828374 0.46398193 V2 0.38557771 0.37083911 -0.3004923 0.1253908 -0.4405188 -0.5424613 0.869493425 0.023291914 0.9625392 -0.83196773 V3 0.61923503 -0.27615909 0.1759168 -0.7333568 -0.4256801 -0.6170807 0.438613391 -0.003632086 0.4113488 -0.40590330 V4 0.72093123 0.68479573 0.5032486 0.3720876 -0.6775834 0.2445693 0.353658359 -0.839104640 -0.8122970 -0.42322187 V5 -0.08640529 0.04432795 -0.5120129 -0.9327905 -0.5821378 0.4671473 -0.367677007 0.483375219 -0.7849003 0.57686729 V6 -0.72451704 0.75814550 0.7838393 -0.7650238 0.6742669 0.2260757 0.001645839 0.570753074 0.1944579 0.07917656 V7 0.64516271 0.51994540 0.9057388 -0.3976167 -0.7403159 -0.2873382 -0.809354444 0.319095368 -0.9766422 -0.71981321 V8 -0.51509049 0.18727837 -0.1971454 -0.4290346 0.5657622 0.5324266 0.451608266 -0.715594335 -0.2749510 0.38234855 V9 0.49035803 0.50252397 0.7736783 0.3342899 -0.2732427 0.1128947 0.870315070 -0.291482237 0.5171181 -0.59784449 V10 -0.51811224 -0.67159723 0.8903813 -0.7562222 -0.9790557 -0.5830560 -0.715136643 0.167987391 -0.0529399 0.44570592 > library(ggplot2) > p <- ggplot(data=melt(m), aes(x=X1, y=X2, color=value)) > p + geom_point(size=5, alpha=0.7) + scale_color_gradient2() 

the plot

I don’t think it would be difficult to look at the 100x100 plot and find extreme values ​​with an eye. :)

0
source

I do not take responsibility for this, simply by posting the code in case the link dies ... Credit to Dimitris in the r-help list, It returns a list p of the top correlations with each variable, sorted.

 cor.mat <- cor(matrix(rnorm(100*1000), 1000, 100)) p <- 30 # how many top items n <- ncol(cor.mat) cmat <- col(cor.mat) ind <- order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n) dim(ind) <- dim(cor.mat) ind <- ind[seq(2, p + 1), ] out <- cbind(ID = c(col(ind)), ID2 = c(ind)) as.data.frame(cbind(out,cor = cor.mat[out])) 
0
source

Source: https://habr.com/ru/post/1339098/


All Articles