How to calculate the average of the top 10% in R

My dataset contains several observations for different species. Each species has a different number of observations. Looking for a quick way in R to calculate the average of 10% of the values ​​for a given variable for each species.

I figured out how to get a given number of values ​​(i.e. the top 20 values).

clim6 <-setDT(range)[order(species, clim6),.SD[1:20],by=species] write.csv(Bioclimlo6, file = "clim6.csv") 

I also know that there is a way to crop a dataset to create an average of the remaining dataset, but I'm not sure how to crop only the bottom 90%.

 mean(x, trim = 0, na.rm = FALSE) 
+5
source share
1 answer

The average value of 10% of the values ​​using the R base:

 x = c(1:100,NA) mean(x[x>=quantile(x, 0.9, na.rm=TRUE)], na.rm=TRUE) 

The average value is 10% of the values, grouping the variable:

 # Fake data dat = data.frame(x=1:100, group=rep(LETTERS[1:3], c(30,30,40))) 

With dplyr

 library(dplyr) dat %>% group_by(group) %>% summarise(meanTop10pct = mean(x[x>=quantile(x, 0.9)])) 
  group meanTop10pct (fctr) (dbl) 1 A 29.0 2 B 59.0 3 C 98.5 

With data.table

 library(data.table) setDT(dat)[, list(meanTop10pct = mean(x[x>=quantile(x, 0.9)])), by=group] 
  group meanTop10pct 1: A 29.0 2: B 59.0 3: C 98.5 
+8
source

Source: https://habr.com/ru/post/1246984/


All Articles