y <- cumsum(x)[cumsum(v)] y <- c(y[1], diff(y))
It seems like he is doing extra work because he is calculating cumsum for the whole vector, but he is actually faster than other solutions so far, for a small and a large number of groups.
This is how I modeled the data
set.seed(5) N <- 1e6 n <- 10 x <- round(runif(N,0,100),1) v <- as.vector(table(sample(n, N, replace=TRUE)))
On my machine, timings with n <- 10 are:
- Brandon Bertelsen (for the cycle): 0.017
- Ramnath (rowsum): 0.057
- John (split / apply): 0.280
- Aaron (cumsum): 0.008
changes to n <- 1e5 , timings:
- Brandon Bertelsen (for the cycle): 2.181
- Ramnath (rowsum): 0.226
- John (split / apply): 0.852
- Aaron (cumsum): 0.015
I suspect that this is faster than performing matrix multiplication even with a sparse matrix package, because you do not need to form a matrix or do any kind of multiplication. If more speed is required, I suspect that you can speed it up by writing it in C; not hard to do with inline and rcpp , but I will leave it to you.
source share