A faster way to create a variable that concatenates a column by id

Question

A faster way to create a variable that concatenates a column by id

Is there a faster way to do this? I assume that this is unnecessarily slow and that such a task can be accomplished using basic functions.

df <- ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc)))

I am completely new to R. I looked at by() , aggregate() and tapply() , but did not make them work at all or the way I wanted. Instead of returning a shorter vector, I want to attach the sum to the original frame. What is the best way to do this?

Edit: Here is a comparison of the response rates applied to my data.

 > # My original solution > system.time( ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc))) ) user system elapsed 14.405 0.000 14.479 > # Paul Hiemstra > system.time( ddply(df, "id", transform, perc.total = sum(cand.perc)) ) user system elapsed 15.973 0.000 15.992 > # Richie Cotton > system.time( with(df, tapply(df$cand.perc, df$id, sum))[df$id] ) user system elapsed 0.048 0.000 0.048 > # John > system.time( with(df, ave(cand.perc, id, FUN = sum)) ) user system elapsed 0.032 0.000 0.030 > # Christoph_J > system.time( df[ , list(perc.total = sum(cand.perc)), by="id"][df]) user system elapsed 0.028 0.000 0.028

+6

performance r aggregate plyr

ilprincipe Nov 22 '11 at 10:54

source share

6 answers

Since you are completely new to R and speed is apparently a problem for you, I recommend the data.table package, which is very fast. One way to solve your problem in one line is as follows:

 library(data.table) DT <- data.table(ID = rep(c(1:3), each=3), cand.perc = 1:9, key="ID") DT <- DT[ , perc.total := sum(cand.perc), by = ID] DT ID Perc.total cand.perc [1,] 1 6 1 [2,] 1 6 2 [3,] 1 6 3 [4,] 2 15 4 [5,] 2 15 5 [6,] 2 15 6 [7,] 3 24 7 [8,] 3 24 8 [9,] 3 24 9

Disclaimer: I'm not a data.table specialist (yet ;-), so there may be faster ways to do this. Check the package website to get started if you are interested in using the package: http://datatable.r-forge.r-project.org/

+12

Christoph_J Nov 22 '11 at 13:03

source share

Use tapply to get group statistics, and then add them back to your dataset.

Playable example:

 means_by_wool <- with(warpbreaks, tapply(breaks, wool, mean)) warpbreaks$means.by.wool <- means_by_wool[warpbreaks$wool]

Unsolicited solution for your scenario:

 sum_by_id <- with(df, tapply(cand.perc, id, sum)) df$perc.total <- sum_by_id[df$id]

+3

Richie cotton Nov 22 '11 at 11:28

source share

Why are you using cbind (x, ...), ddply output will be added automatically. This should work:

 ddply(df, "id", transform, perc.total = sum(cand.perc))

getting rid of excess cbind should speed up the process.

0

Paul hiemstra Nov 22 '11 at 11:32

source share

ilprincipe, if none of the above meets your needs, you can try transferring your data

 dft=t(df)

then use aggregate

 dfta=aggregate(dft,by=list(rownames(dft)),FUN=sum)

then return your names.

 rownames(dfta)=dfta[,1] dfta=dfta[,2:ncol(dfta)]

Move back to original orientation

 df2=t(dfta)

and binding to the source data

 newdf=cbind(df,df2)

0

boczniak767 Nov 22 '11 at 12:50

source share

You can also download your favorite foreach server and try the .parallel = TRUE argument for ddply.

0

Zach Nov 23 '11 at 14:54

source share

John · Accepted Answer · 2011-11-22T12:18:48+0000

For any type of aggregation where you want the resulting vector to be the same length as the input vector with replicas grouped by the ave grouping vector.

 df$perc.total <- ave(df$cand.perc, df$id, FUN = sum)

A faster way to create a variable that concatenates a column by id

More articles: