Using the dplyr package and function sample_frac
, you can try the percentage of each group. Do I need to sort the elements in each group first and then select the top x% from each group?
There is a function top_n
, but here I can determine the number of rows, and I need a relative value.
For example, the following data is grouped by gear and sorted by wt
inside each group:
library(dplyr)
mtcars %>%
select(gear, wt) %>%
group_by(gear) %>%
arrange(gear, wt)
gear wt
1 3 2.465
2 3 3.215
3 3 3.435
4 3 3.440
5 3 3.460
6 3 3.520
7 3 3.570
8 3 3.730
9 3 3.780
10 3 3.840
11 3 3.845
12 3 4.070
13 3 5.250
14 3 5.345
15 3 5.424
16 4 1.615
17 4 1.835
18 4 1.935
19 4 2.200
20 4 2.320
21 4 2.620
22 4 2.780
23 4 2.875
24 4 3.150
25 4 3.190
26 4 3.440
27 4 3.440
28 5 1.513
29 5 2.140
30 5 2.770
31 5 3.170
32 5 3.570
Now I would like to select the top 20% in each gear group.
It would be very nice if the solution could be integrated with the dplyr function group_by
.
source
share