R colsums by group

In the following matrix dataset:

1 2 3 4 5 1950 7 20 21 15 61 1951 2 10 6 26 57 1952 12 27 43 37 34 1953 14 16 40 47 94 1954 2 17 62 113 101 1955 3 4 43 99 148 1956 2 47 31 85 79 1957 17 5 38 216 228 1958 11 20 15 76 68 1959 16 20 43 30 226 1960 9 28 28 70 201 1961 1 31 124 74 137 1962 12 25 37 41 200 

I am trying to calculate colSums by decade, i.e. find the sum of each column from 1950 to 1959, and then from 1960-69, etc.

I tried tapply, ddply, etc., but couldn't figure out what would really work.

+4
source share
4 answers

First, we set up the matrix used as input.

 Lines <- "1 2 3 4 5 1950 7 20 21 15 61 1951 2 10 6 26 57 1952 12 27 43 37 34 1953 14 16 40 47 94 1954 2 17 62 113 101 1955 3 4 43 99 148 1956 2 47 31 85 79 1957 17 5 38 216 228 1958 11 20 15 76 68 1959 16 20 43 30 226 1960 9 28 28 70 201 1961 1 31 124 74 137 1962 12 25 37 41 200 " DF <- read.table(text = Lines, check.names = FALSE) m <- as.matrix(DF) 

Now below we show some alternative solutions. (1) seems the most flexible in the sense that we can easily replace sum other functions to get various effects, but (2) is the shortest for this particular task. Also note that there are slight differences. (1) creates a data frame, while the other two create a matrix.

1) aggregate

 decade <- 10 * as.numeric(rownames(m)) %/% 10 m.ag <- aggregate(m, data.frame(decade), sum) 

which gives these data.frame:

 > m.ag decade 1 2 3 4 5 1 1950 86 186 342 744 1096 2 1960 22 84 189 185 538 

2) rowsum This one is shorter. This gives a matrix result.

 rowsum(m, decade) 

3) split/sapply . This one also produces a matrix. if we had DF we could replace as.data.frame (m) with DF shortening it a bit.

 t(sapply(split(as.data.frame(m), decade), colSums)) 

EDIT: Added solutions (2) and (3) Added some clarifications.

+13
source

First you need to define a grouping variable, then you can use your selection tool ( aggregate , ddply , whatever).

 > aggregate(x, by=list(trunc(as.numeric(rownames(x))/10)), sum) Group.1 X1 X2 X3 X4 X5 1 195 86 186 342 744 1096 2 196 22 84 189 185 538 
+6
source

There may be a more elegant basic R solution, but it works.

 # Construct a nicely named vector with which to split your data.frame breaks <- seq(1950, 2000, by=10) names <- c("50's", "60's", "70's", "80's", "90's") decade <- cut(as.numeric(row.names(df)), seq(1950, 2000, by=10), labels=names, right=FALSE) # by() splits df apart, operating on each of its pieces. # do.call(rbind, ...) sutures the results back together. do.call(rbind, by(df, decade, colSums)) # X1 X2 X3 X4 X5 # 50 86 186 342 744 1096 # 60 22 84 189 185 538 
+3
source

by is an option:

 by(x,10*(as.numeric(rownames(x))%/%10),colSums) INDICES: 1950 1 2 3 4 5 86 186 342 744 1096 ------------------------------------------------------------ INDICES: 1960 1 2 3 4 5 22 84 189 185 538 
+2
source

Source: https://habr.com/ru/post/1394031/


All Articles