I found that data.table and dplyr have different results when trying to do the same. I would like to use the dplyr syntax, but compute it like data.table does. A use case is that I want to add subtotals to the table. To do this, I need to do some aggregation for each variable, but then keep the same variable names (in the converted version). Data.table allows me to do some aggregation on a variable and keep the same name. Then do another aggregation with the same variable. He will continue to use the untransformed version. Dplyr, however, will use the converted version.
The final documentation says:
mtcars %>%
group_by(cyl) %>%
summarise(disp = mean(disp), sd = sd(disp))
This is basically the problem I am facing, but I am wondering if there is a good way around the problem. One thing I found is just to name the transformed variable something else, and then rename it at the end, but that doesn't look very good to me. If there is a good way to make subtotals, it would be good to know. I looked at this site and did not see this exact situation being discussed. Any help would be greatly appreciated!
Here I made a simple example: once with the results of data.table and once with dplyr. I want to take this simple table and add an intermediate row, which is the weighted average of the column of interest (Total).
library(data.table)
library(dplyr)
dt <- data.table(Group = LETTERS[1:5],
Count = c(1000, 1500, 1200, 2000, 5000),
Total = c(50, 300, 600, 400, 1000))
dt[, Count_Dist := Count/sum(Count)]
dt[, .(Count_Dist = sum(Count_Dist), Weighted_Total = sum(Count_Dist*Total))]
dt <- rbind(dt[, .(Group, Count_Dist, Total)],
dt[, .(Group = "All", Count_Dist = sum(Count_Dist), Total = sum(Count_Dist*Total))])
setnames(dt, "Total", "Weighted_Avg_Total")
dt
df <- data.frame(Group = LETTERS[1:5],
Count = c(1000, 1500, 1200, 2000, 5000),
Total = c(50, 300, 600, 400, 1000))
df %>%
mutate(Count_Dist = Count/sum(Count)) %>%
summarize(Count_Dist = sum(Count_Dist),
Weighted_Total = sum(Count_Dist*Total))
df %>%
mutate(Count_Dist = Count/sum(Count)) %>%
select(Group, Count_Dist, Total) %>%
rbind(df %>%
mutate(Count_Dist = Count/sum(Count)) %>%
summarize(Group = "All",
Count_Dist = sum(Count_Dist),
Total = sum(Count_Dist*Total))) %>%
rename(Weighted_Avg_Total = Total)
Thanks again for any help!