Adding another grouping with dplyr

Question

I would like to change the data structure twice by grouping two sets of columns that intersect each other. i.e:.

df <- df %>% group_by(a, b) %>% mutate(x = sum(d)) df <- df %>% group_by(a, b, c) %>% mutate(y = sum(e))

Is there a faster / more elegant way to do this? I was hoping to do something like:

 df <- df %>% group_by(a, b) %>% mutate(x = sum(d)) %>% group_by(c) %>% mutate(y = sum(e))

Or maybe save the variable with the first group_by , and then use it twice.

+5

Sam brightman Oct 29 '15 at 18:04

1 answer

akrun · Accepted Answer · 2015-10-29T18:08:06+0000

We use add=TRUE in the second group_by to group by 3 variables, adding c in the OP example -

  df %>% group_by(a, b) %>% mutate(x = sum(d)) %>% group_by(c, add=TRUE) %>% mutate(y = sum(e))

According to the documentation for ?group_by

By default, when add = FALSE, group_by overrides existing groups. To add to existing groups instead, use add = TRUE

This can be done with a single call to group_by , but only with non-dplyrish functions:

  df %>% group_by(a, b) %>% mutate(x = sum(d), y = ave(e, c, sum))