Extract rows with highest and lowest values ​​from data frame

I am new to R, I use it mainly for visualizing statistics using the ggplot2 library. Now I am faced with the problem of preparing data.

I need to write a function that will remove several rows (2, 5 or 10) from a data frame that have the highest and lowest values ​​in the specified column and put them in another data frame, and do this for each combination of two factors (in in my case: for every day and server).

So far, I have completed the following steps (MWE using the esoph example esoph ).

I sorted the frame by the desired parameter ( ncontrols in the example):

 esoph<-esoph[with(esoph,order(-ncontrols)) ,] 

I can display the first / last records for each coefficient value (in this example for each age range):

 by(data=esoph,INDICES=esoph$agegp,FUN=head,3) by(data=esoph,INDICES=esoph$agegp,FUN=tail,3) 

Basically, I can see the highest and lowest values, but I don’t know how to extract them into another data frame and how to remove them from the main one.

Also in the above example, I can see the upper / lower records for each value of one factor (age range), but in fact I need to know the highest and lowest records for each value of two factors - in this example they can be agegp and alcgp .

I'm not even sure if the above steps are ok - maybe using plyr will work better? I would be grateful for any tips.

+4
source share
2 answers

Yes, you can use plyr as follows:

 esoph <- data.frame(agegp = sample(letters[1:2], 20, replace = TRUE), alcgp = sample(LETTERS[1:2], 20, replace = TRUE), ncontrols = runif(20)) ddply(esoph, c("agegp", "alcgp"), function(x){idx <- c(which.min(x$ncontrols), which.max(x$ncontrols)) x[idx, , drop = FALSE]}) # agegp alcgp ncontrols # 1 a A 0.03091483 # 2 a A 0.88529790 # 3 a B 0.51265447 # 4 a B 0.86111649 # 5 b A 0.28372232 # 6 b A 0.61698401 # 7 b B 0.05618841 # 8 b B 0.89346943 ddply(esoph, c("agegp", "alcgp"), function(x){idx <- c(which.min(x$ncontrols), which.max(x$ncontrols)) x[-idx, , drop = FALSE]}) # agegp alcgp ncontrols # 1 a A 0.3745029 # 2 a B 0.7621474 # 3 a B 0.6319013 # 4 b A 0.3055078 # 5 b A 0.5146028 # 6 b B 0.3735615 # 7 b B 0.2528612 # 8 b B 0.4415205 # 9 b B 0.6868219 # 10 b B 0.3750102 # 11 b B 0.2279462 # 12 b B 0.1891052 

There are possibly many alternatives, for example. using head and tail if your data is already sorted, but this should work.

+3
source

Using base R:

 newesoph <- esoph[esoph$ncontrols == ave(esoph$ncontrols,list(esoph$agegp,esoph$alcgp),FUN = max) | esoph$ncontrols == ave(esoph$ncontrols,list(esoph$agegp,esoph$alcgp),FUN = min), ] 
+1
source

Source: https://habr.com/ru/post/1446422/


All Articles