Ultimately, I try to achieve something similar to the following, but using dplyr instead of plyr :
library(dplyr) probs = seq(0, 1, 0.1) plyr::ldply(tapply(mtcars$mpg, mtcars$cyl, function(x) { quantile(x, probs = probs) }))
The best dplyr equivalent I can come up with is something like this:
library(tidyr) probs = seq(0, 1, 0.1) mtcars %>% group_by(cyl) %>% do(data.frame(prob = probs, stat = quantile(.$mpg, probs = probs))) %>% spread(prob, stat)
Note that I also need to use tidyr::spread . Also, note that I lost the % formatting for the column headers in favor of replacing .id with cyl in the first column.
Questions:
- Is there a better
dplyr approach for doing this tapply %>% ldply chain? - Is there a way to get the most out of both worlds without jumping too many hoops? That is, get
% formatting and the correct column name cyl for the first column?
source share