Group by cumulative dynamic column name

Question

Group by cumulative dynamic column name

Is it possible for group_by use regular expression matching on column names with dplyr?

 library(dplyr) # dplyr_0.5.0; R version 3.3.2 (2016-10-31) # dummy data set.seed(1) df1 <- sample_n(iris, 20) %>% mutate(Sepal.Length = round(Sepal.Length), Sepal.Width = round(Sepal.Width))

Group by static version (looks / works fine, imagine if we have 10-20 columns):

 df1 %>% group_by(Sepal.Length, Sepal.Width) %>% summarise(mySum = sum(Petal.Length))

Group dynamic - ugly version:

 df1 %>% group_by_(.dots = colnames(df1)[ grepl("^Sepal", colnames(df1))]) %>% summarise(mySum = sum(Petal.Length))

Ideally, something like this (doesn't work, since starts_with returns indexes):

 df1 %>% group_by(starts_with("Sepal")) %>% summarise(mySum = sum(Petal.Length))

 Error in eval(expr, envir, enclos) : wrong result size (0), expected 20 or 1

Expected Result:

 # Source: local data frame [6 x 3] # Groups: Sepal.Length [?] # # Sepal.Length Sepal.Width mySum # <dbl> <dbl> <dbl> # 1 4 3 1.4 # 2 5 3 10.9 # 3 6 2 4.0 # 4 6 3 43.7 # 5 7 3 15.7 # 6 8 4 6.4

Note. sounds very similar to a duplicate post, kindly link the relevant messages, if any.

+5

r aggregate dplyr

zx8754 Apr 05 '17 at 10:54

source share

2 answers

if you just want to save it using the dplyr functions, you can try:

 df1 %>% group_by_(.dots = df1 %>% select(contains("Sepal")) %>% colnames()) %>% summarise(mySum = sum(Petal.Length))

although he’s not necessarily much prettier, but he gets rid of the regular expression

+1

Aramis7d Apr 05 '17 at 12:04

source share

zx8754 · Accepted Answer · 2017-04-05T19:47:51+0000

~~This feature will be implemented in a future version.~~ GitHub issue link # 2619 :

The solution would be to use the group_by_at function:

 df1 %>% group_by_at(vars(starts_with("Sepal"))) %>% summarise(mySum = sum(Petal.Length))

Edit: Now this is implemented in dplyr_0.7.1

Group by cumulative dynamic column name

More articles: