A faster way to create a variable that concatenates a column by id

Is there a faster way to do this? I assume that this is unnecessarily slow and that such a task can be accomplished using basic functions.

df <- ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc))) 

I am completely new to R. I looked at by() , aggregate() and tapply() , but did not make them work at all or the way I wanted. Instead of returning a shorter vector, I want to attach the sum to the original frame. What is the best way to do this?

Edit: Here is a comparison of the response rates applied to my data.

 > # My original solution > system.time( ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc))) ) user system elapsed 14.405 0.000 14.479 > # Paul Hiemstra > system.time( ddply(df, "id", transform, perc.total = sum(cand.perc)) ) user system elapsed 15.973 0.000 15.992 > # Richie Cotton > system.time( with(df, tapply(df$cand.perc, df$id, sum))[df$id] ) user system elapsed 0.048 0.000 0.048 > # John > system.time( with(df, ave(cand.perc, id, FUN = sum)) ) user system elapsed 0.032 0.000 0.030 > # Christoph_J > system.time( df[ , list(perc.total = sum(cand.perc)), by="id"][df]) user system elapsed 0.028 0.000 0.028 
+6
source share
6 answers

For any type of aggregation where you want the resulting vector to be the same length as the input vector with replicas grouped by the ave grouping vector.

 df$perc.total <- ave(df$cand.perc, df$id, FUN = sum) 
+6
source

Since you are completely new to R and speed is apparently a problem for you, I recommend the data.table package, which is very fast. One way to solve your problem in one line is as follows:

 library(data.table) DT <- data.table(ID = rep(c(1:3), each=3), cand.perc = 1:9, key="ID") DT <- DT[ , perc.total := sum(cand.perc), by = ID] DT ID Perc.total cand.perc [1,] 1 6 1 [2,] 1 6 2 [3,] 1 6 3 [4,] 2 15 4 [5,] 2 15 5 [6,] 2 15 6 [7,] 3 24 7 [8,] 3 24 8 [9,] 3 24 9 

Disclaimer: I'm not a data.table specialist (yet ;-), so there may be faster ways to do this. Check the package website to get started if you are interested in using the package: http://datatable.r-forge.r-project.org/

+12
source

Use tapply to get group statistics, and then add them back to your dataset.

Playable example:

 means_by_wool <- with(warpbreaks, tapply(breaks, wool, mean)) warpbreaks$means.by.wool <- means_by_wool[warpbreaks$wool] 

Unsolicited solution for your scenario:

 sum_by_id <- with(df, tapply(cand.perc, id, sum)) df$perc.total <- sum_by_id[df$id] 
+3
source

Why are you using cbind (x, ...), ddply output will be added automatically. This should work:

 ddply(df, "id", transform, perc.total = sum(cand.perc)) 

getting rid of excess cbind should speed up the process.

0
source

ilprincipe, if none of the above meets your needs, you can try transferring your data

 dft=t(df) 

then use aggregate

 dfta=aggregate(dft,by=list(rownames(dft)),FUN=sum) 

then return your names.

 rownames(dfta)=dfta[,1] dfta=dfta[,2:ncol(dfta)] 

Move back to original orientation

 df2=t(dfta) 

and binding to the source data

 newdf=cbind(df,df2) 
0
source

You can also download your favorite foreach server and try the .parallel = TRUE argument for ddply.

0
source

Source: https://habr.com/ru/post/902099/


All Articles