I use the dplyr group_by and summarise functions with a custom aggregate function and observe strange behavior. It seems that the cumulative function is evaluated twice for each group.
Here is a minimal example:
aggFun <- function(x) { print("Inside function"); print(rnorm(1)); sum(x)} df <- data.frame(key = rep("a", 3), val = 1:3) df %>% group_by(key) %>% summarise(sum = aggFun(val))
The following is displayed:
[1] "Inside function" [1] 0.3230769 [1] "Inside function" [1] -0.3347653 # A tibble: 1 Γ 2 key sum <fctr> <int> 1 a 6
Since there is only one group, should the function be evaluated only once? Am I experiencing the same thing in a large application and worried that this might be bad for performance, or am I missing something?
Solved by updating to GitHub version .
source share