Calculation of the moving sums of stretch marks of a vector with R

Question

Calculation of the moving sums of stretch marks of a vector with R

I have a long vector x, and another v that contains lengths. I would like to sum x so that the answer y a vector of length length(v) , and y[1] is sum(x[1:v[i]]) , y[2] is sum(x[(1+v[1]):(v[1]+v[2])]) , etc. In fact, this is the execution of sparse matrix multiplication from a space of dimension length(x) to one of the dimensions length(v) . However, I would prefer not to introduce “advanced technology”, although I may have to do this. This is needed very, very fast. Can anyone think of something simpler than using a sparse matrix package?

Example -

 x <- c(1,1,3,4,5) v <- c(2,3) y <- myFunc(x,v)

y should be c(2,12)

I am open to any preprocessing - for example, storing in v the initial indices of each stretch.

+4

matrix r multiplication

ryan Nov 01 '11 at 23:50

source share

4 answers

You can do this using rowsum . It should be fast enough, since it uses C code in the background.

 y <- rowsum(x, rep(1:length(v), v))

+1

Ramnath Nov 01 '11 at 23:59

source share

This is a little different.

 s <- rep(1:length(v), v) l <- split(x, s) y <- sapply(l, sum)

+1

John Nov 02 '11 at 0:40

source share

Try something like:

 for (i in 1:length(v)) { y[i] <- ifelse(i > 1,sum(x[v[i-1]:v[i]]), sum(x[1:v[i]])) }

0

Brandon bertelsen Nov 01 '11 at 23:54

source share

Aaron · Accepted Answer · 2011-11-02T01:17:29+0000

  y <- cumsum(x)[cumsum(v)] y <- c(y[1], diff(y))

It seems like he is doing extra work because he is calculating cumsum for the whole vector, but he is actually faster than other solutions so far, for a small and a large number of groups.

This is how I modeled the data

 set.seed(5) N <- 1e6 n <- 10 x <- round(runif(N,0,100),1) v <- as.vector(table(sample(n, N, replace=TRUE)))

On my machine, timings with n <- 10 are:

Brandon Bertelsen (for the cycle): 0.017
Ramnath (rowsum): 0.057
John (split / apply): 0.280
Aaron (cumsum): 0.008

changes to n <- 1e5 , timings:

Brandon Bertelsen (for the cycle): 2.181
Ramnath (rowsum): 0.226
John (split / apply): 0.852
Aaron (cumsum): 0.015

I suspect that this is faster than performing matrix multiplication even with a sparse matrix package, because you do not need to form a matrix or do any kind of multiplication. If more speed is required, I suspect that you can speed it up by writing it in C; not hard to do with inline and rcpp , but I will leave it to you.

Calculation of the moving sums of stretch marks of a vector with R

More articles: