R Plyr - Order Results from DDPLY?

Does anyone know how many way to arrange the results coming out of the ddply sum operation?

This is what I do to get a result ordered in descending depth.

ddims <- ddply(diamonds, .(color), summarise, depth = mean(depth), table = mean(table)) ddims <- ddims[order(-ddims$depth),] 

With the exit ...

 > ddims color depth table 7 J 61.88722 57.81239 6 I 61.84639 57.57728 5 H 61.83685 57.51781 4 G 61.75711 57.28863 1 D 61.69813 57.40459 3 F 61.69458 57.43354 2 E 61.66209 57.49120 

Not too ugly, but I hope how good it is to do it in ddply (). Does anyone know how?

Hadley's book ggplot2 has this example for ddply and a subset, but it doesn’t actually sort the result, just picking the two smallest diamonds per group.

 ddply(diamonds, .(color), subset, order(carat) <= 2) 
+6
source share
4 answers

I will take this opportunity to advertise data.table , which runs faster and (in my perception) is at least as elegant to write:

 library(data.table) ddims <- data.table(diamonds) system.time(ddims <- ddims[, list(depth=mean(depth), table=mean(table)), by=color][order(depth)]) user system elapsed 0.003 0.000 0.004 

In contrast, without ordering, your ddply code already takes 30 times longer:

  user system elapsed 0.106 0.010 0.119 

With all the respect I have for Hadley, great work, for example. on ggplot2 and general awesomeness, I have to admit that for me data.table completely replaced by ddply - for speed reasons.

+7
source

Yes, for sorting you can just ddply in another ddply . Here, as you would use ddply to sort by one column, for example a table column:

 ddimsSortedTable <- ddply(ddply(diamonds, .(color), summarise, depth = mean(depth), table = mean(table)), .(table)) color depth table 1 G 61.75711 57.28863 2 D 61.69813 57.40459 3 F 61.69458 57.43354 4 E 61.66209 57.49120 5 H 61.83685 57.51781 6 I 61.84639 57.57728 7 J 61.88722 57.81239 
+3
source

If you are using dplyr , I would recommend using the %.% Operator, which reads more intuitive code.

 data(diamonds, package = 'ggplot2') library(dplyr) diamonds %.% group_by(color) %.% summarise( depth = mean(depth), table = mean(table) ) %.% arrange(desc(depth)) 
+3
source

A bit late for the party, but with dplyr the situation may be a little different. Borrowing a crayola solution for data.table:

 dat1 <- microbenchmark( dtbl<- data.table(diamonds)[, list(depth=mean(depth), table=mean(table)), by=color][order(- depth)], dplyr_dtbl <- arrange(summarise(group_by(tbl_dt(diamonds),color), depth = mean(depth) , table = mean(table)),-depth), dplyr_dtfr <- arrange(summarise(group_by(tbl_df(diamonds),color), depth = mean(depth) , table = mean(table)),-depth), times = 20, unit = "ms" ) 

The results show that dplyr with tbl_dt is slightly slower than the data.table method. However dplyr with data.frame is faster:

  expr min lq median uq max neval data.table 9.606571 10.968881 11.958644 12.675205 14.334525 20 dplyr_data.table 13.553307 15.721261 17.494500 19.544840 79.771768 20 dplyr_data.frame 4.643799 5.148327 5.887468 6.537321 7.043286 20 

Note. I obviously changed the names so that the microdetection results are more readable.

+1
source

Source: https://habr.com/ru/post/886992/


All Articles