Calculation of the moving sums of stretch marks of a vector with R

I have a long vector x, and another v that contains lengths. I would like to sum x so that the answer y a vector of length length(v) , and y[1] is sum(x[1:v[i]]) , y[2] is sum(x[(1+v[1]):(v[1]+v[2])]) , etc. In fact, this is the execution of sparse matrix multiplication from a space of dimension length(x) to one of the dimensions length(v) . However, I would prefer not to introduce “advanced technology”, although I may have to do this. This is needed very, very fast. Can anyone think of something simpler than using a sparse matrix package?

Example -

 x <- c(1,1,3,4,5) v <- c(2,3) y <- myFunc(x,v) 

y should be c(2,12)

I am open to any preprocessing - for example, storing in v the initial indices of each stretch.

+4
source share
4 answers
  y <- cumsum(x)[cumsum(v)] y <- c(y[1], diff(y)) 

It seems like he is doing extra work because he is calculating cumsum for the whole vector, but he is actually faster than other solutions so far, for a small and a large number of groups.

This is how I modeled the data

 set.seed(5) N <- 1e6 n <- 10 x <- round(runif(N,0,100),1) v <- as.vector(table(sample(n, N, replace=TRUE))) 

On my machine, timings with n <- 10 are:

  • Brandon Bertelsen (for the cycle): 0.017
  • Ramnath (rowsum): 0.057
  • John (split / apply): 0.280
  • Aaron (cumsum): 0.008

changes to n <- 1e5 , timings:

  • Brandon Bertelsen (for the cycle): 2.181
  • Ramnath (rowsum): 0.226
  • John (split / apply): 0.852
  • Aaron (cumsum): 0.015

I suspect that this is faster than performing matrix multiplication even with a sparse matrix package, because you do not need to form a matrix or do any kind of multiplication. If more speed is required, I suspect that you can speed it up by writing it in C; not hard to do with inline and rcpp , but I will leave it to you.

+8
source

You can do this using rowsum . It should be fast enough, since it uses C code in the background.

 y <- rowsum(x, rep(1:length(v), v)) 
+1
source

This is a little different.

 s <- rep(1:length(v), v) l <- split(x, s) y <- sapply(l, sum) 
+1
source

Try something like:

 for (i in 1:length(v)) { y[i] <- ifelse(i > 1,sum(x[v[i-1]:v[i]]), sum(x[1:v[i]])) } 
0
source

Source: https://habr.com/ru/post/1379233/


All Articles