This seems to be a problem with the way dplyr sets up the environment for calling data.table. The problem appears in the dplyr:::summarise_.grouped_dt function dplyr:::summarise_.grouped_dt . Currently it looks like
function (.data, ..., .dots) { dots <- lazyeval::all_dots(.dots, ..., all_named = TRUE) for (i in seq_along(dots)) { if (identical(dots[[i]]$expr, quote(n()))) { dots[[i]]$expr <- quote(.N) } } list_call <- lazyeval::make_call(quote(list), dots) call <- substitute(dt[, list_call, by = vars], list(list_call = list_call$expr)) env <- dt_env(.data, parent.frame()) out <- eval(call, env) grouped_dt(out, drop_last(groups(.data)), copy = FALSE) } <environment: namespace:dplyr>
and if we debug this function and look at the trace when it is called, we see
where 1: summarise_.grouped_dt(.data, .dots = lazyeval::lazy_dots(...)) where 2: summarise_(.data, .dots = lazyeval::lazy_dots(...)) where 3: summarise(., sum.bad = sum(y == bad)) where 4: function_list[[k]](value) where 5: withVisible(function_list[[k]](value)) where 6: freduce(value, `_function_list`) where 7: `_fseq`(`_lhs`) where 8: eval(expr, envir, enclos) where 9: eval(quote(`_fseq`(`_lhs`)), env, env) where 10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) where 11 at
So the important line is
env <- dt_env(.data, parent.frame())
one. Here he sets up the path to the environment, which determines where to look for all the variables in the call. Here it just uses parent.frame, which looks for where the function was called from, but since you are actually jumping through a few hoops to get to that function from your summarize call inside f() , it doesn't seem to be the right parent frame . If instead you run
env <- dt_env(.data, parent.frame(2))
in debug mode, which apparently falls into the correct parent frame. So I think the problem is the jump from summarize() to summarize_() , because this
ff <- function(x, y, bad) { z <- data.table(x,y, key = "x") z2 <- z %>% group_by(x) %>% summarise_(.dots=list(sum.bad = quote(sum(y == bad)))) z2 } ff(rnorm(100), rnorm(100) < 0, bad = FALSE)
seems to work. So really dplyr needs to set up the correct environment. The tricky part is that it looks different if you call summarize or summarize_ directly. Perhaps summarise() can change the environment when it calls summarise_ to have the same parent.frame via eval() . But I would probably have logged this as a bug report, and Hadley decided how to fix it. Sort of
summarise <- function(.data, ...) { call <- match.call() call <- as.call(c(as.list(call)[1:2], list(.dots=as.list(call)[-(1:2)]))) call[[1]] <- quote(summarise_) eval(call, envir=parent.frame()) }
will be a "traditional" way to do this. Not sure if the lazyeval package has any nicer ways to do this or not.
Tested with data.table_1.9.2 and dplyr_0.3.0.2