Choose minus operator in dplyr group_by

Question

Choose minus operator in dplyr group_by

Does anyone know a quick way to select all-but columns (or all-but-few dplyr::group_by ) when using dplyr::group_by ? Ultimately, I just want to aggregate over all the individual rows after deleting several selection columns, but I don't want to explicitly list all the grouping columns each time (since they are often added and removed in my analysis).

Example:

  > df <- data_frame(a = c(1,1,2,2), b = c("foo", "foo", "bar", "bar"), c = runif(4)) > df Source: local data frame [4 x 3] abc (dbl) (chr) (dbl) 1 1 foo 0.95460749 2 1 foo 0.05094088 3 2 bar 0.93032589 4 2 bar 0.40081121

Now I want to aggregate by a and b , so I can do this:

  > df %>% group_by(a, b) %>% summarize(mean(c)) Source: local data frame [2 x 3] Groups: a [?] ab mean(c) (dbl) (chr) (dbl) 1 1 foo 0.5027742 2 2 bar 0.6655686

Great. But I really would like to do something like just not c , similar to dplyr::select(-c) :

  > df %>% select(-c) Source: local data frame [4 x 2] ab (dbl) (chr) 1 1 foo 2 1 foo 3 2 bar 4 2 bar

But group_by can apply expressions, so the equivalent does not work:

  > df %>% group_by(-c) %>% summarize(mean(c)) Source: local data frame [4 x 2] -c mean(c) (dbl) (dbl) 1 -0.95460749 0.95460749 2 -0.93032589 0.93032589 3 -0.40081121 0.40081121 4 -0.05094088 0.05094088

Does anyone know if I just skipped a basic function or shortcut to help me do this quickly?

Usage example: if df unexpectedly receives a new column d , I would like the code downstream to now be combined using unique combinations of a , b and d , without me explicitly adding d to the group_by call.)

+5

r dplyr

mmuurr Mar 28 '16 at 16:44

source share

1 answer

user295691 · Accepted Answer · 2017-12-28T17:21:46+0000

In current versions of dplyr, the group_by_at function, together with vars , performs this task:

 df %>% group_by_at(vars(-c)) %>% summarize(mean(c)) # A tibble: 2 x 3 # Groups: a [?] ab `sum(c)` <dbl> <chr> <dbl> 1 1 foo 0.9851376 2 2 bar 1.0954412

It appears that was introduced in dplyr 0.7.0, in June 2017.

Choose minus operator in dplyr group_by

More articles: