Dplyr-0.6.0 programming without quotes

I am trying to write a simple wrapper for summarise() arbitrary variables by arbitrary groups and have made progress, now I have a version of the correct version of the library , but again I got confused (again) on how to ignore multi-value arguments.

I currently have the following function ...

 table_summary <- function(df = ., id = individual_id, select = c(), group = site, ...){ ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html) quo_id <- enquo(id) quo_select <- enquo(select) quo_group <- enquo(group) ## Subset the data df <- df %>% dplyr::select(!!quo_id, !!quo_select, !!quo_group) %>% unique() ## gather() data, just in case there is > 1 variable selected to be summarised df <- df %>% gather(key = variable, value = value, !!quo_select) ## Summarise selected variables by specified groups results <- df %>% group_by(!!quo_group, variable) %>% summarise(n = n(), mean = mean(value, na.rm = TRUE)) return(results) } 

Which gets most of the way and works if I specify one grouping variable ...

 > table_summary(df = mtcars, id = model, select = c(mpg), group = gear) # A tibble: 3 x 4 # Groups: c(gear) [?] gear variable n mean <dbl> <chr> <int> <dbl> 1 3 mpg 15 16.10667 2 4 mpg 12 24.53333 3 5 mpg 5 21.38000 

... but does not work with group_by(!!quo_group, variable) when I specify more than one group = c(gear, hp) ...

 > mtcars$model <- rownames(mtcars) > table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear, hp)) Error in mutate_impl(.data, dots) : Column `c(gear, hp)` must be length 32 (the group size) or one, not 64 

I went back and re-read the dplyr documentation programming , and I read that you can capture several variables using quos() instead of enquo() and then unquote-splicing them with !!! so I tried ...

 table_summary <- function(df = ., id = individual_id, select = c(), group = c(), digits = 3, ...){ ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html) quo_id <- enquo(id) quo_select <- enquo(select) quo_group <- quos(group) ## Use quos() rather than enquo() UQS(quo_group) %>% print() ## Check to see what quo_group holds ## Subset the data df <- df %>% dplyr::select(!!quo_id, !!quo_select, !!!quo_group)) %>% unique() ## gather() data, just in case there is > 1 variable selected to be summarised df <- df %>% gather(key = variable, value = value, !!quo_select) ## Summarise selected variables by specified groups results <- df %>% group_by(!!!quo_group, variable) %>% summarise(n = n(), mean = mean(value, na.rm = TRUE)) return(results) } 

... which now fails on the first link to !!!quo_group``within dplyr :: select () regardless of how many variables are specified under group = `...

 > table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear)) [[1]] <quosure: frame> ~group attr(,"class") [1] "quosures" Error in overscope_eval_next(overscope, expr) : object 'gear' not found > traceback() 17: .Call(rlang_eval, f_rhs(quo), overscope) 16: overscope_eval_next(overscope, expr) 15: FUN(X[[i]], ...) 14: lapply(.x, .f, ...) 13: map(.x[matches], .f, ...) 12: map_if(ind_list, !is_helper, eval_tidy, data = names_list) 11: select_vars(names(.data), !(!(!quos(...)))) 10: select.data.frame(., !(!quo_id), !(!quo_select), !(!(!quo_group))) 9: dplyr::select(., !(!quo_id), !(!quo_select), !(!(!quo_group))) 8: function_list[[i]](value) 7: freduce(value, `_function_list`) 6: `_fseq`(`_lhs`) 5: eval(quote(`_fseq`(`_lhs`)), env, env) 4: eval(quote(`_fseq`(`_lhs`)), env, env) 3: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 2: df %>% dplyr::select(!(!quo_id), !(!quo_select), !(!(!quo_group))) %>% unique() 1: table_summary(df = mtcars, id = model, select = c(mpg), group = c(gear)) 

It seems strange, and I think the source of the problem is that !!!quo_group (i.e. UQS(quo_group) ) outputs ~gear , not a list of quosures, since adding print() to the processed examples shows ...

 > my_summarise <- function(df, ...) { group_by <- quos(...) UQS(group_by) %>% print() df %>% group_by(!!!group_by) %>% summarise(a = mean(a)) } > df <- tibble( g1 = c(1, 1, 2, 2, 2), g2 = c(1, 2, 1, 2, 1), a = sample(5), b = sample(5) ) > my_summarise(df, g1, g2) [[1]] <quosure: global> ~g1 [[2]] <quosure: global> ~g2 attr(,"class") [1] "quosures" # A tibble: 4 x 3 # Groups: g1 [?] g1 g2 a <dbl> <dbl> <dbl> 1 1 1 1.0 2 1 2 5.0 3 2 1 2.5 4 2 2 4.0 

I want to explicitly specify the variables that I want to group as a parameter into my argument, but does it work if I specify them as ... , but I decided to check if my function works when delivering grouping variables as ...

 table_summary <- function(df = ., id = individual_id, select = c(), group = c(), digits = 3, ...){ ## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html) quo_id <- enquo(id) quo_select <- enquo(select) ## quo_group <- quos(group) quo_group <- quos(...) UQS(quo_group) %>% print() ## Subset the data df <- df %>% dplyr::select(!!quo_id, !!quo_select, !!!quo_group) %>% unique() ## gather() data, just in case there is > 1 variable selected to be summarised df <- df %>% gather(key = variable, value = value, !!quo_select) ## Summarise selected variables by specified groups results <- df %>% group_by(!!!quo_group, variable) %>% summarise(n = n(), mean = mean(value, na.rm = TRUE)) return(results) } 

... but it is not, quos() again unquote-splices to NULL , so the variables are not selected and not grouped ...

 > table_summary(df = mtcars, id = model, select = c(mpg), gear, hp) NULL # A tibble: 1 x 3 variable n mean <chr> <int> <dbl> 1 mpg 32 20.09062 > table_summary(df = mtcars, id = model, select = c(mpg), gear) NULL # A tibble: 1 x 3 variable n mean <chr> <int> <dbl> 1 mpg 32 20.09062 

I went through this cycle several times, checking each method of using enquo() and quos() , but I can not see where I am mistaken, and despite the fact that I read dplyr programming documentation several times.

+5
source share
1 answer

IIUC your post, you want to put c(col1, col2) on group_by() . This is not supported by this verb:

 group_by(mtcars, c(cyl, am)) #> Error in mutate_impl(.data, dots) : #> Column `c(cyl, am)` must be length 32 (the number of rows) or one, not 64 

This is because group_by() has a mutation of semantics, not semantics. This means that the expressions you specify for group_by() are transform expressions. This is an amazing but very convenient feature. For example, you can group by disp into three measures:

 group_by(mtcars, cut3 = cut(disp, 3)) 

This also means that if you specify c(cyl, am) , it will combine the two columns together and return a vector of length 64, while it was expecting a length of 32 (number of rows).

So your problem is that you want the group_by() wrapper to have selection semantics. This is easy to do using dplyr::select_vars() , which will soon be extracted to the new tidyselect package:

 library("dplyr") group_wrapper <- function(df, groups = rlang::chr()) { groups <- select_vars(tbl_vars(df), !! enquo(groups)) group_by(df, !!! rlang::syms(groups)) } 

Alternatively, you can wrap a new group_by_at() verb that has semantics of choice:

 group_wrapper <- function(df, groups = rlang::chr()) { group_by_at(df, vars(!! enquo(groups))) } 

Try:

 group_wrapper(mtcars, c(disp, am)) #> # A tibble: 32 x 11 #> # Groups: disp, am [27] #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21.0 6 160 110 3.90 2.62 16.5 0 1 4 4 #> # ... with 22 more rows 

This interface has the advantage of supporting all select() operations to select columns for grouping.

Note that I use rlang::chr() as the default argument, because c() returns NULL , which is not supported by function selection (we can change this in the future). chr() , called without arguments, returns a character vector of length 0.

+4
source

Source: https://habr.com/ru/post/1268253/


All Articles