When I use plyr and dplyr to parse a large dataset that is grouped by id, I sometimes get an error in my function. I can use browser () or debugger () to examine what is happening, but one problem is that I donβt know if the problem is with the first id or 100th. I can use the debugger so that I can stop the error, but is there an easy way to see which identifier caused the problem, in addition to including the identifier as an input function for the sole purpose of debugging? I illustrate the example below.
meanerr = function(y) { m = mean(y) stopifnot(!is.na(m)) return(m) } d = data.frame(id=c(1,1,1,1,2,2),y=c(1,2,3,4,5,NA)) dsumm = ddply(d,"id",summarise,mean=meanerr(y))
Of course, this leads to the error below, and when I dive into the dump, I just need to figure out where to look (see below)
> options(error=dump.frames) > source('~/svn/pgm/test_debug_ddply.R') Error: !is.na(m) is not TRUE > debugger() Message: Error: !is.na(m) is not TRUE Available environments had calls: 1: source("~/svn/pgm/test_debug_ddply.R") 2: withVisible(eval(ei, envir)) 3: eval(ei, envir) 4: eval(expr, envir, enclos) 5: test_debug_ddply.R
In any case, maybe just including the identifier as an input every time for easy debugging is just the way to go, but I was wondering if there is something more elegant that professionals use without requiring additional variables to go through.
Andy