Data.frame Group by column

I have a DF data frame.

Say DF:

AB 1 1 2 2 1 3 3 2 3 4 3 5 5 3 6 

Now I want to combine the rows with column A and have the sum of column B.

For example:

  AB 1 1 5 2 2 3 3 3 11 

I am currently using an SQL query using the sqldf function. But for some reason it is very slow. Is there a more convenient way to do this? I could do it manually using the for loop, but it is again slow. My SQL query: "Select A, Count (B) from the DF group through A".

In general, whenever I do not use vectorized operations and use for loops, performance is very slow even for individual procedures.

+46
r aggregate
Sep 14 '13 at 8:36
source share
4 answers

This is a common question. In the basic version, you can choose aggregate . Assuming your data.frame is called "mydf", you can use the following.

 > aggregate(B ~ A, mydf, sum) AB 1 1 5 2 2 3 3 3 11 

I would also recommend looking at the "data.table" package.

 > library(data.table) > DT <- data.table(mydf) > DT[, sum(B), by = A] A V1 1: 1 5 2: 2 3 3: 3 11 
+76
Sep 14 '13 at 8:39 on
source share

Using dplyr :

 require(dplyr) df <- data.frame(A = c(1, 1, 2, 3, 3), B = c(2, 3, 3, 5, 6)) df %>% group_by(A) %>% summarise(B = sum(B)) ## Source: local data frame [3 x 2] ## ## AB ## 1 1 5 ## 2 2 3 ## 3 3 11 

With sqldf :

 library(sqldf) sqldf('SELECT A, SUM(B) AS B FROM df GROUP BY A') 
+14
Jan 31 '15 at 19:53
source share

I would recommend plyr look at the plyr package. It may not be as fast as data.table or other packages, but it is very instructive, especially when starting with R and having to do some data manipulation.

 > DF <- data.frame(A = c("1", "1", "2", "3", "3"), B = c(2, 3, 3, 5, 6)) > library(plyr) > DF.sum <- ddply(DF, c("A"), summarize, B = sum(B)) > DF.sum AB 1 1 5 2 2 3 3 3 11 
+7
Sep 14 '13 at 9:38 on
source share
 require(reshape2) T <- melt(df, id = c("A")) T <- dcast(T, A ~ variable, sum) 

I am not sure of the exact advantage over the unit.

+3
Jul 31 '15 at 0:40
source share



All Articles