Adding another grouping with dplyr

I would like to change the data structure twice by grouping two sets of columns that intersect each other. i.e:.

df <- df %>% group_by(a, b) %>% mutate(x = sum(d)) df <- df %>% group_by(a, b, c) %>% mutate(y = sum(e)) 

Is there a faster / more elegant way to do this? I was hoping to do something like:

 df <- df %>% group_by(a, b) %>% mutate(x = sum(d)) %>% group_by(c) %>% mutate(y = sum(e)) 

Or maybe save the variable with the first group_by , and then use it twice.

+5
source share
1 answer

We use add=TRUE in the second group_by to group by 3 variables, adding c in the OP example -

  df %>% group_by(a, b) %>% mutate(x = sum(d)) %>% group_by(c, add=TRUE) %>% mutate(y = sum(e)) 

According to the documentation for ?group_by

By default, when add = FALSE, group_by overrides existing groups. To add to existing groups instead, use add = TRUE

This can be done with a single call to group_by , but only with non-dplyrish functions:

  df %>% group_by(a, b) %>% mutate(x = sum(d), y = ave(e, c, sum)) 
+7
source

Source: https://habr.com/ru/post/1234816/


All Articles