I see what the strange behavior of the do function looks like in dplyr 0.3.0.2, but maybe I donโt understand something.
I have a data frame that looks like
set.seed(668) stuff <- data.frame(name=c(rep("Frodzak", 5), rep("Dumpf", 4), rep("Ackpth", 6)), state=c("AL", "AK", "AL", "KS", "OR", "LA", "MS", "KY", "FL", "NY", "NY", "NJ", "PA", "NJ", "NY"), important=c(F, F, T, F, F, T, F, F, F, T, F, F, F, F, F), girth=rnorm(15, 250, 80), stringsAsFactors=F) stuff name state important girth 1 Frodzak AL FALSE 148.5870 2 Frodzak AK FALSE 321.4144 3 Frodzak AL TRUE 224.8380 4 Frodzak KS FALSE 315.9416 5 Frodzak OR FALSE 331.4336 6 Dumpf LA TRUE 317.4794 7 Dumpf MS FALSE 170.4174 8 Dumpf KY FALSE 275.4033 9 Dumpf FL FALSE 240.9276 10 Ackpth NY TRUE 145.6290 11 Ackpth NY FALSE 267.6902 12 Ackpth NJ FALSE 171.4015 13 Ackpth PA FALSE 298.5841 14 Ackpth NJ FALSE 249.5764 15 Ackpth NY FALSE 276.5504
In my application, the โimportantโ column will have exactly one TRUE for each row group with the same โnameโ. I want to multiply df to include only those lines in which the state corresponds to the state of the "important" line (within each group there is a "name"). In other words, I want to get
name state important girth 1 Ackpth NY TRUE 145.6290 2 Ackpth NY FALSE 267.6902 3 Ackpth NY FALSE 276.5504 4 Dumpf LA TRUE 317.4794 5 Frodzak AL FALSE 148.5870 6 Frodzak AL TRUE 224.8380
If I run the following:
importantState <- function(df) { impst <- df[df$important, "state"] if (length(impst) != 1) stop("group does not have one 'important'") impst } stuff %>% group_by(name) %>% do(.[.$state == importantState(.), ])
In dplyr 0.2 I get exactly what I expect (the above subset of 6 lines). However, if I run the same code with dplyr 0.3.0.2 , it returns the entire source df (all 15 lines).
I looked at the 0.3 release notes on github, but I don't see anything that could affect the change in material behavior in do .
Can someone help me restore at least a little of my sanity by explaining what is happening here in heaven? Or any ideas for creative work that I did not think about?