The odd behavior of the do () function in dplyr

I see what the strange behavior of the do function looks like in dplyr 0.3.0.2, but maybe I donโ€™t understand something.

I have a data frame that looks like

 set.seed(668) stuff <- data.frame(name=c(rep("Frodzak", 5), rep("Dumpf", 4), rep("Ackpth", 6)), state=c("AL", "AK", "AL", "KS", "OR", "LA", "MS", "KY", "FL", "NY", "NY", "NJ", "PA", "NJ", "NY"), important=c(F, F, T, F, F, T, F, F, F, T, F, F, F, F, F), girth=rnorm(15, 250, 80), stringsAsFactors=F) stuff name state important girth 1 Frodzak AL FALSE 148.5870 2 Frodzak AK FALSE 321.4144 3 Frodzak AL TRUE 224.8380 4 Frodzak KS FALSE 315.9416 5 Frodzak OR FALSE 331.4336 6 Dumpf LA TRUE 317.4794 7 Dumpf MS FALSE 170.4174 8 Dumpf KY FALSE 275.4033 9 Dumpf FL FALSE 240.9276 10 Ackpth NY TRUE 145.6290 11 Ackpth NY FALSE 267.6902 12 Ackpth NJ FALSE 171.4015 13 Ackpth PA FALSE 298.5841 14 Ackpth NJ FALSE 249.5764 15 Ackpth NY FALSE 276.5504 

In my application, the โ€œimportantโ€ column will have exactly one TRUE for each row group with the same โ€œnameโ€. I want to multiply df to include only those lines in which the state corresponds to the state of the "important" line (within each group there is a "name"). In other words, I want to get

  name state important girth 1 Ackpth NY TRUE 145.6290 2 Ackpth NY FALSE 267.6902 3 Ackpth NY FALSE 276.5504 4 Dumpf LA TRUE 317.4794 5 Frodzak AL FALSE 148.5870 6 Frodzak AL TRUE 224.8380 

If I run the following:

 importantState <- function(df) { impst <- df[df$important, "state"] if (length(impst) != 1) stop("group does not have one 'important'") impst } stuff %>% group_by(name) %>% do(.[.$state == importantState(.), ]) 

In dplyr 0.2 I get exactly what I expect (the above subset of 6 lines). However, if I run the same code with dplyr 0.3.0.2 , it returns the entire source df (all 15 lines).

I looked at the 0.3 release notes on github, but I don't see anything that could affect the change in material behavior in do .

Can someone help me restore at least a little of my sanity by explaining what is happening here in heaven? Or any ideas for creative work that I did not think about?

+5
source share
1 answer

Perhaps you could try filter here?

 stuff %>% group_by(name) %>% filter(state == state[important]) # name state important girth # 1 Frodzak AL FALSE 148.5870 # 2 Frodzak AL TRUE 224.8380 # 3 Dumpf LA TRUE 317.4794 # 4 Ackpth NY TRUE 145.6290 # 5 Ackpth NY FALSE 267.6902 # 6 Ackpth NY FALSE 276.5504 
+2
source

Source: https://habr.com/ru/post/1209491/


All Articles