R: get rid of the loop and acceleration code

I would like to speed up my calculations and get results without using the m loop function. Playable example:

 N <- 2500 n <- 500 r <- replicate(1000, sample(N, n)) m <- function(r, N) { ic <- matrix(0, nrow = N, ncol = N) for (i in 1:ncol(r)) { p <- r[, i] ic[p, p] <- ic[p, p] + 1 } ic } system.time(ic <- m(r, N)) # user system elapsed # 6.25 0.51 6.76 isSymmetric(ic) # [1] TRUE 

In each iteration of the for loop, we are dealing with a matrix, not a vector, so how can this be Vectorized?

@ joel.wilson The purpose of this function is to calculate the pair frequencies of elements. Therefore, subsequently we could estimate the probabilities of pairwise inclusion.

Thanks @Hasha and @alexis_laz. Tests:

 > require(rbenchmark) > benchmark(m(r, N), + m1(r, N), + mvec(r, N), + alexis(r, N), + replications = 10, order = "elapsed") test replications elapsed relative user.self sys.self user.child sys.child 4 alexis(r, N) 10 4.73 1.000 4.63 0.11 NA NA 3 mvec(r, N) 10 5.36 1.133 5.18 0.18 NA NA 2 m1(r, N) 10 5.48 1.159 5.29 0.19 NA NA 1 m(r, N) 10 61.41 12.983 60.43 0.90 NA NA 
+6
source share
1 answer

This should be significantly faster since it avoids double indexing operations.

 m1 <- function(r, N) { ic <- matrix(0, nrow = N, ncol=ncol(r)) for (i in 1:ncol(r)) { p <- r[, i] ic[, i][p] <- 1 } tcrossprod(ic) } system.time(ic1 <- m1(r, N)) # user system elapsed # 0.53 0.01 0.55 all.equal(ic, ic1) # [1] TRUE 

Simple tally / add operations can almost always be vectorized

 mvec <- function(r, N) { ic <- matrix(0, nrow = N, ncol=ncol(r)) i <- rep(1:ncol(r), each=nrow(r)) ic[cbind(as.vector(r), i)] <- 1 tcrossprod(ic) } 
+6
source

Source: https://habr.com/ru/post/1012718/


All Articles