I am working on a function test to calculate the transmission rates for a specific criterion in my laboratory. The math behind this is very simple: given the number of tests that either passed or failed, what percentage passed.
The data will be provided as a column of values ββthat are either P1 (transmitted during the first test), F1 (failed during the first test), P2 or F2 (passed or failed in the second test, respectively). I wrote the passRate function below to help in calculating the total passing speeds (first and second attempts) and in the first test and second test in isolation.
The quality specialist, who set up the parameters for verification, gave me a list of passes and failures that I convert to a vector using the test_vector function below.
Everything looked fine until I got to the third row of the Pass data frame, which contains the pass / fail data from my quality specialist. Instead of returning a second test pass rate of 100%, it returns NA ... but only when I use mutate
library(dplyr) Pass <- structure(list(P1 = c(2L, 0L, 10L), F1 = c(0L, 2L, 0L), P2 = c(0L, 3L, 2L), F2 = c(0L, 2L, 0L), id = 1:3), .Names = c("P1", "F1", "P2", "F2", "id"), class = c("tbl_df", "data.frame"), row.names = c(NA, -3L))
So, here is something similar to what I did with mutate .
Pass %>% group_by(id) %>% mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100, pass_rate1 = P1 / (P1 + F1) * 100, pass_rate2 = P2 / (P2 + F2) * 100) Source: local data frame [3 x 8] Groups: id [3] P1 F1 P2 F2 id pass_rate pass_rate1 pass_rate2 (int) (int) (int) (int) (int) (dbl) (dbl) (dbl) 1 2 0 0 0 1 100.00000 100 NA 2 0 2 3 2 2 42.85714 0 60 3 10 0 3 1 3 100.00000 100 NA
Compare when I use summarise
Pass %>% group_by(id) %>% summarise(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100, pass_rate1 = P1 / (P1 + F1) * 100, pass_rate2 = P2 / (P2 + F2) * 100) Source: local data frame [3 x 4] id pass_rate pass_rate1 pass_rate2 (int) (dbl) (dbl) (dbl) 1 1 100.00000 100 NA 2 2 42.85714 0 60 3 3 100.00000 100 100
I would expect them to return the same results. I suppose mutate has problems somewhere, because it assumes that n lines for each group should appear in n lines as a result (does this get confused when calculating n here?), And summarise knows that no matter how many lines starts with it, it ends only 1.
Does anyone have any thoughts on what mechanics are behind this behavior?