How to sum grouped elements of a vector in R

Say I have this vector

v <- c(1:100) 

And I want to get the following:

 b[1] = sum (v[c(1:10)]) b[2] = sum (v[c(11:20)]) ... ... 

I can do a loop to solve this problem, but I'm sure there is an "R-path" that should be something like:

 b <- groupedSum(v, 10) 

where b will be a vector that will have each group of 10 summed. What is the path of R?

+2
source share
5 answers
 > tapply( v, (seq_along(v)-1) %/% 10, sum) 0 1 2 3 4 5 6 7 8 9 55 155 255 355 455 555 655 755 855 955 

If there was NA, you might need to add na.rm = TRUE to the argument list after sum .

Comments: I think Tyler's approach is more perfect because he provided better documentation. He suffers from having to work with the vagaries of the cut() function, which I always felt, having the wrong default values. To create a group that captures all 1: 100, he needs to use the vector element 101. But this is not Tyler's mistake. Send him further votes, his answer is better.

If gsk can use by-objects without encountering the difficulties of a class, it is better than me. The result looks like a list, but it really is something else. Using his example:

 > is.list(by(v,idx,sum)) [1] FALSE > is.matrix(by(v,idx,sum)) [1] FALSE > is.vector(by(v,idx,sum)) [1] FALSE 

I think by-objects are like named vectors and kind of like matrices, but the inability to inherit the matrix class always confused me.

+6
source

Step 1: Create an index for the groups:

 N <- 50 size <- 10 # Size of a group v <- seq(N) idx <- as.factor(rep(seq(N/size),each=size)) 

Step 2. Use any number of vectorized tools (by, plyr, etc.) to sum over the groups:

 by(v,idx,sum) 

Step 3: Profit

 idx: 1 [1] 55 --------------------------------------------------------------------------------- idx: 2 [1] 155 --------------------------------------------------------------------------------- idx: 3 [1] 255 --------------------------------------------------------------------------------- idx: 4 [1] 355 --------------------------------------------------------------------------------- idx: 5 [1] 455 
+3
source

There are already two good methods. I suggest using a cut to give you a range in output:

 v <- c(1:100) dat <- data.frame(v=v, cat = cut(v, seq(0, 100, by=10))) aggregate(v~cat, data=dat, sum) 

Yielding:

  cat v 1 (0,10] 55 2 (10,20] 155 3 (20,30] 255 4 (30,40] 355 5 (40,50] 455 6 (50,60] 555 7 (60,70] 655 8 (70,80] 755 9 (80,90] 855 10 (90,100] 955 
+2
source

A faster method (20-300 times faster compared to the above methods) for large data sets is to cast as a matrix, and then use colSums.

 > colSums( matrix( v, nrow = 10, ncol = 10 )) [1] 55 155 255 355 455 555 655 755 855 955 

Consider a larger dataset

 > n_per_group = 1e3 > n_groups = 1e3; > v = 1:(n_per_group * n_groups) 

using the matrix method, takes 5 ms

 > start = Sys.time(); > r1 =colSums( matrix( v, nrow = n_per_group, ncol = n_groups )) > end = Sys.time() > end-start Time difference of 0.005604982 secs 

using tapply method takes 601 ms

 > start = Sys.time(); > r2 = as.numeric( tapply( v, (seq_along( v ) - 1) %/% n_per_group, sum ) ) > end = Sys.time() > end-start Time difference of 0.6015229 secs > all.equal( r1, r2) [1] TRUE 

using the by 103ms method

 > start = Sys.time(); > idx = as.factor( rep( seq( n_groups ), each = n_per_group ) ) > r3 = as.numeric(by(v,idx,sum)) > end = Sys.time() > end-start Time difference of 0.1034958 secs > all.equal( r1, r3) [1] TRUE 

using dataframe method requires 1675 ms

 > start = Sys.time(); > dat <- data.frame(v=v, cat = cut(v, seq(0, n_per_group * n_groups, by= n_per_group ))) > r4 = aggregate(v~cat, data=dat, sum)$v > end = Sys.time() > end-start Time difference of 1.675465 secs > all.equal( r1, r4) [1] TRUE 

and using the spare parts matrix method takes 334 ms

 > library( Matrix ) > start = Sys.time(); > f = gl( n_groups, n_per_group ) > r5 = as( f, "sparseMatrix" ) %*% v > r5 = as.numeric( r5[ , 1 ] ) > end = Sys.time() > end-start Time difference of 0.334847 secs > all.equal( r1, r5) [1] TRUE 
+1
source

This solution requires the Matrix library.

 v <- seq(100)# example data f <- gl(10,10)# generate factor for grouping v_sums <- as(f,"sparseMatrix") %*% v 
0
source

Source: https://habr.com/ru/post/1397650/


All Articles