Data.frame Group by column

Question

Data.frame Group by column

I have a DF data frame.

Say DF:

AB 1 1 2 2 1 3 3 2 3 4 3 5 5 3 6

Now I want to combine the rows with column A and have the sum of column B.

For example:

  AB 1 1 5 2 2 3 3 3 11

I am currently using an SQL query using the sqldf function. But for some reason it is very slow. Is there a more convenient way to do this? I could do it manually using the for loop, but it is again slow. My SQL query: "Select A, Count (B) from the DF group through A".

In general, whenever I do not use vectorized operations and use for loops, performance is very slow even for individual procedures.

+46

r aggregate

nikosdi Sep 14 '13 at 8:36

source share

4 answers

Using dplyr :

 require(dplyr) df <- data.frame(A = c(1, 1, 2, 3, 3), B = c(2, 3, 3, 5, 6)) df %>% group_by(A) %>% summarise(B = sum(B)) ## Source: local data frame [3 x 2] ## ## AB ## 1 1 5 ## 2 2 3 ## 3 3 11

With sqldf :

 library(sqldf) sqldf('SELECT A, SUM(B) AS B FROM df GROUP BY A')

+14

mpalanco Jan 31 '15 at 19:53

source share

I would recommend plyr look at the plyr package. It may not be as fast as data.table or other packages, but it is very instructive, especially when starting with R and having to do some data manipulation.

 > DF <- data.frame(A = c("1", "1", "2", "3", "3"), B = c(2, 3, 3, 5, 6)) > library(plyr) > DF.sum <- ddply(DF, c("A"), summarize, B = sum(B)) > DF.sum AB 1 1 5 2 2 3 3 3 11

+7

r0bert Sep 14 '13 at 9:38 on

source share

 require(reshape2) T <- melt(df, id = c("A")) T <- dcast(T, A ~ variable, sum)

I am not sure of the exact advantage over the unit.

+3

Soc Jul 31 '15 at 0:40

source share

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-09-14 08:39

This is a common question. In the basic version, you can choose aggregate . Assuming your data.frame is called "mydf", you can use the following.

 > aggregate(B ~ A, mydf, sum) AB 1 1 5 2 2 3 3 3 11

I would also recommend looking at the "data.table" package.

 > library(data.table) > DT <- data.table(mydf) > DT[, sum(B), by = A] A V1 1: 1 5 2: 2 3 3: 3 11

Data.frame Group by column

More articles: