Rank correlation matrix in R

How to create a rank correlation matrix in an elegant way in R, given a data frame with many columns? I could not find the built-in function, so I tried

> test=data.frame(x=c(1,2,3,4,5), y=c(5,4,3,2,1)) > cor(rank(test)) 

(only 2 columns for simplicity, real data have 5 columns), which gave

 > Error in cor(rank(test)) : supply both 'x' and 'y' or a matrix-like 'x' 

I realized that this is because rank takes one vector. So i tried

 > cor(lapply(test,rank)) 

to get the rank applied to each column in the data frame, treating the data frame as a list that gave an error

 > supply both 'x' and 'y' or a matrix-like 'x' 

and I finally got something working with

 > cor(data.frame(lapply(test,rank))) xy x 1 -1 y -1 1 

However, this seems rather verbose and ugly. I think there should be a better way - if so, then what?

+4
source share
1 answer

You are doing this wrong - instead use the kendall method kendall to cor() :

 R> testdf <- data.frame(x=c(1,2,3,4,5), y=c(5,4,3,2,1)) R> cor(testdf, method="kendall") xyx 1 -1 y -1 1 R> 

From help(cor) :

For cor() , if the method is "kendall" or "spearman" , Kendall tau or Spearman rho statistics is used to evaluate the rank measure of association. They are more reliable and were recommended if the data do not necessarily come from a two-dimensional normal distribution. For cov() , the non-Pearson method is unusual, but is available for completeness. Note that "spearman" basically calculates cor(R(x), R(y)) (or cov(.,.) ), Where R(u) := rank(u, na.last="keep") In case of missing values, the ranks are calculated depending on the value either on the basis of complete observations or on the basis of pairwise completeness with redistribution for each pair.

+6
source

Source: https://habr.com/ru/post/1481992/


All Articles