Debugging in plyr or dplyr - seeing which group

When I use plyr and dplyr to parse a large dataset that is grouped by id, I sometimes get an error in my function. I can use browser () or debugger () to examine what is happening, but one problem is that I don’t know if the problem is with the first id or 100th. I can use the debugger so that I can stop the error, but is there an easy way to see which identifier caused the problem, in addition to including the identifier as an input function for the sole purpose of debugging? I illustrate the example below.

meanerr = function(y) { m = mean(y) stopifnot(!is.na(m)) return(m) } d = data.frame(id=c(1,1,1,1,2,2),y=c(1,2,3,4,5,NA)) dsumm = ddply(d,"id",summarise,mean=meanerr(y)) 

Of course, this leads to the error below, and when I dive into the dump, I just need to figure out where to look (see below)

 > options(error=dump.frames) > source('~/svn/pgm/test_debug_ddply.R') Error: !is.na(m) is not TRUE > debugger() Message: Error: !is.na(m) is not TRUE Available environments had calls: 1: source("~/svn/pgm/test_debug_ddply.R") 2: withVisible(eval(ei, envir)) 3: eval(ei, envir) 4: eval(expr, envir, enclos) 5: test_debug_ddply.R#9: ddply(d, "id", summarise, mean = meanerr(y)) 6: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, .inform = .inform, .parallel = . 7: llply(.data = .data, .fun = .fun, ..., .progress = .progress, .inform = .inform, .parallel = .p 8: loop_apply(n, do.ply) 9: (function (i) { piece <- pieces[[i]] if (.inform) { res <- try(.fun(piece, ...)) 10: .fun(piece, ...) 11: eval(cols[[col]], .data, parent.frame()) 12: eval(expr, envir, enclos) 13: meanerr(y) 14: test_debug_ddply.R#3: stopifnot(!is.na(m)) 15: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), ch), call. = FALSE, 

In any case, maybe just including the identifier as an input every time for easy debugging is just the way to go, but I was wondering if there is something more elegant that professionals use without requiring additional variables to go through.

Andy

+5
source share
1 answer

I come across this all the time with dplyr group_by() I am having problems using the usual options(error=recover) .

I found that completing the insult function in tryCatch() does the trick:

 > dsumm = ddply(d,"id",summarise,mean=tryCatch(meanerr(y),error=function(e){"error"})) > dsumm id mean 1 1 2.5 2 2 error 
+4
source

Source: https://habr.com/ru/post/1240551/


All Articles