A faster method (20-300 times faster compared to the above methods) for large data sets is to cast as a matrix, and then use colSums.
> colSums( matrix( v, nrow = 10, ncol = 10 )) [1] 55 155 255 355 455 555 655 755 855 955
Consider a larger dataset
> n_per_group = 1e3 > n_groups = 1e3; > v = 1:(n_per_group * n_groups)
using the matrix method, takes 5 ms
> start = Sys.time(); > r1 =colSums( matrix( v, nrow = n_per_group, ncol = n_groups )) > end = Sys.time() > end-start Time difference of 0.005604982 secs
using tapply method takes 601 ms
> start = Sys.time(); > r2 = as.numeric( tapply( v, (seq_along( v ) - 1) %/% n_per_group, sum ) ) > end = Sys.time() > end-start Time difference of 0.6015229 secs > all.equal( r1, r2) [1] TRUE
using the by 103ms method
> start = Sys.time(); > idx = as.factor( rep( seq( n_groups ), each = n_per_group ) ) > r3 = as.numeric(by(v,idx,sum)) > end = Sys.time() > end-start Time difference of 0.1034958 secs > all.equal( r1, r3) [1] TRUE
using dataframe method requires 1675 ms
> start = Sys.time(); > dat <- data.frame(v=v, cat = cut(v, seq(0, n_per_group * n_groups, by= n_per_group ))) > r4 = aggregate(v~cat, data=dat, sum)$v > end = Sys.time() > end-start Time difference of 1.675465 secs > all.equal( r1, r4) [1] TRUE
and using the spare parts matrix method takes 334 ms
> library( Matrix ) > start = Sys.time(); > f = gl( n_groups, n_per_group ) > r5 = as( f, "sparseMatrix" ) %*% v > r5 = as.numeric( r5[ , 1 ] ) > end = Sys.time() > end-start Time difference of 0.334847 secs > all.equal( r1, r5) [1] TRUE