Find max for each group with dplyr in R

I try to get the carrier with the maximum flights in the summer months

max_flights_all_c<-nycflights13::flights %>%
   group_by(carrier,month)%>%
   filter(month==6 | month==7 | month==8 | month==9)%>%
    summarise(n=n()) 

Now I get:

carrier month   n
9E  7   1494
9E  8   1456
9E  9   1540
AA  6   2757
AA  7   2882
AA  8   2856
AA  9   2614
AS  6   60
AS  7   62
AS  8   62
AS  9   60
B6  6   4622
B6  7   4984

but you want to get only the maximum n value for a month every month.

+4
source share
2 answers

After the stage, summarisewe group by “month” and get the string max“n” with slice.

max_flights_all_c <- nycflights13::flights %>%
                          group_by(carrier,month)%>%
                          filter(month %in% 6:9) %>%
                          summarise(n = n()) %>%
                          group_by(month) %>%
                          slice(which.max(n))
+3
source

The loan goes to @Henk for an updated solution data.table:

setDT(nycflights13::flights)[month %between% c(6,9), .N, keyby = .(carrier, month)][, .SD[which.max(N)], month]

   month carrier    n
1:     6      UA 4975
2:     7      UA 5066
3:     8      UA 5124
4:     9      EV 4725

The original solution is in the response history.

Microobject: (for those who care)

library(microbenchmark)
microbenchmark(henk=setDT(nycflights13::flights)[month %between% c(6,9), .N, keyby = .(carrier, month)][, .SD[which.max(N)], month],
               akrun=nycflights13::flights %>%
                 group_by(carrier,month)%>%
                 filter(month %in% 6:9) %>%
                 summarise(n = n()) %>%
                 group_by(month) %>%
                 slice(which.max(n)))

Unit: milliseconds
  expr       min       lq      mean    median        uq       max neval
  henk  5.612305  6.41659  7.416813  6.953205  7.515347  49.38172   100
 akrun 45.529320 47.51715 51.943065 48.882663 49.834458 221.39357   100
+1
source

Source: https://habr.com/ru/post/1658339/


All Articles