Speed โ€‹โ€‹up rowMeans matrix

Consider the following matrix:

nc <- 5000 nr <- 1024 m <- matrix(rnorm(nc*nr), ncol=nc) 

I want to take the difference between rowMeans two groups of the same size, obtained randomly in this matrix.

 n <- 1000 # group size system.time(replicate(100, { ind1 <- sample(seq.int(nc), n) ind2 <- sample(seq.int(nc), n) rowMeans(m[, ind1]) - rowMeans(m[, ind2]) })) 

This is pretty slow, unfortunately, I did not understand the output of Rprof (did it seem that most of the time was spent on is.data.frame ?)

Suggestions for something more efficient?

I reviewed the following:

  • Rcpp : from my online readings, I believe that R rowMeans is quite effective, so itโ€™s not clear that this will help at this point. I would like to make sure where the bottleneck is really the first, maybe my whole project is suboptimal. If most of the time is spent creating copies for each of the smaller matrices, will Rcpp work better?

  • upgrading to R-devel, it seems the new .rowMeans feature is .rowMeans more efficient. Has anyone tried this?

Thanks.

+4
source share
2 answers

Each rowSums() call of a subset of columns from m can be considered as a matrix multiplication between m and a vector 0 or 1 indicating the selected columns. If you match all these vectors, you get multiplication between the two matrices (which is much more efficient):

 ind1 <- replicate(100, seq.int(nc) %in% sample(seq.int(nc), n)) ind2 <- replicate(100, seq.int(nc) %in% sample(seq.int(nc), n)) output <- m %*% (ind1 - ind2) 
+7
source

You do not need 2 calls to rowMeans . You can perform the subtraction first and call rowMeans on the result.

 x1 <- rowMeans(m[,ind1])-rowMeans(m[,ind2]) x2 <- rowMeans(m[,ind1]-m[,ind2]) all.equal(x1,x2) # [1] TRUE 

is.data.frame is part of the checks performed in rowMeans .

UPDATE: Regarding .rowMeans in R-devel, it looks like it's just a direct call to the internal code (assuming do_colsum n't changed). It is defined as:

 .rowMeans <- function(X, m, n, na.rm = FALSE) .Internal(rowMeans(X, m, n, na.rm)) 

In your case, m=1024 and n=1000 .

+4
source

Source: https://habr.com/ru/post/1398726/


All Articles