What is the dplyr equivalent of plyr :: ldply (tapply) in R?

Ultimately, I try to achieve something similar to the following, but using dplyr instead of plyr :

 library(dplyr) probs = seq(0, 1, 0.1) plyr::ldply(tapply(mtcars$mpg, mtcars$cyl, function(x) { quantile(x, probs = probs) })) # .id 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # 1 4 21.4 21.50 22.80 22.80 24.40 26.0 27.30 30.40 30.40 32.40 33.9 # 2 6 17.8 17.98 18.32 18.98 19.40 19.7 20.48 21.00 21.00 21.16 21.4 # 3 8 10.4 11.27 13.90 14.66 15.04 15.2 15.44 15.86 16.76 18.28 19.2 

The best dplyr equivalent I can come up with is something like this:

 library(tidyr) probs = seq(0, 1, 0.1) mtcars %>% group_by(cyl) %>% do(data.frame(prob = probs, stat = quantile(.$mpg, probs = probs))) %>% spread(prob, stat) # cyl 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # 1 4 21.4 21.50 22.80 22.80 24.40 26.0 27.30 30.40 30.40 32.40 33.9 # 2 6 17.8 17.98 18.32 18.98 19.40 19.7 20.48 21.00 21.00 21.16 21.4 # 3 8 10.4 11.27 13.90 14.66 15.04 15.2 15.44 15.86 16.76 18.28 19.2 

Note that I also need to use tidyr::spread . Also, note that I lost the % formatting for the column headers in favor of replacing .id with cyl in the first column.

Questions:

  • Is there a better dplyr approach for doing this tapply %>% ldply chain?
  • Is there a way to get the most out of both worlds without jumping too many hoops? That is, get % formatting and the correct column name cyl for the first column?
+6
source share
2 answers

Using dplyr

 library(dplyr) mtcars %>% group_by(cyl) %>% do(data.frame(as.list(quantile(.$mpg,probs=probs)), check.names=FALSE)) # cyl 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% #1 4 21.4 21.50 22.80 22.80 24.40 26.0 27.30 30.40 30.40 32.40 33.9 #2 6 17.8 17.98 18.32 18.98 19.40 19.7 20.48 21.00 21.00 21.16 21.4 #3 8 10.4 11.27 13.90 14.66 15.04 15.2 15.44 15.86 16.76 18.28 19.2 

Or option using data.table

 library(data.table) as.data.table(mtcars)[, as.list(quantile(mpg, probs=probs)) , cyl] # cyl 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% #1: 6 17.8 17.98 18.32 18.98 19.40 19.7 20.48 21.00 21.00 21.16 21.4 #2: 4 21.4 21.50 22.80 22.80 24.40 26.0 27.30 30.40 30.40 32.40 33.9 #3: 8 10.4 11.27 13.90 14.66 15.04 15.2 15.44 15.86 16.76 18.28 19.2 
+7
source

The @akrun version is good, but I would use data_frame_ inside the do statement.

 mtcars %>% group_by(cyl) %>% do(data_frame_(quantile(.$mpg, probs = probs))) ## Source: local data frame [3 x 12] ## Groups: cyl ## ## cyl 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ## 1 4 21.4 21.50 22.80 22.80 24.40 26.0 27.30 30.40 30.40 32.40 33.9 ## 2 6 17.8 17.98 18.32 18.98 19.40 19.7 20.48 21.00 21.00 21.16 21.4 ## 3 8 10.4 11.27 13.90 14.66 15.04 15.2 15.44 15.86 16.76 18.28 19.2 

After further exploring why this works, it looks like data_frame_ is different from the usual SE logic used in dplyr . data_frame_ accepts only one columns argument and really expects lazy_dots argument.

If instead it receives a vector, it works because the lazy evaluation of the individual arguments works. Thus, this function of using data_frame_ on such a vector may be an error.

+5
source

Source: https://habr.com/ru/post/988389/


All Articles