Aggregation of data in R with user-defined function

I grouped the data into R using the aggregated method.

Avg=aggregate(x$a, by=list(x$b,x$c),FUN= mean) 

This gives me the average value for all values ​​of 'a', grouped by 'b' and 'c' of the data frame 'x'.

Now, instead of taking the average of all the values ​​of "a", I want to take the average of the three maximum values ​​of "a", grouped by "b" and "c".

Data set example

 abc 10 G 3 20 G 3 22 G 3 10 G 3 15 G 3 25 G 3 30 G 3 

After the β€œAggregate” function, this will give me

 Group.1 Group.2 x G 3 18.85 

But I want to take only 5 "a" values ​​for averages

 Group.1 Group.2 x G 3 22.40 

I cannot take into account the maximum function below that I use in the Agrregate function

 index <- order(vector, decreasing = T)[1:5] vector(index) 

Can anyone shed some light on how this is possible?

+6
source share
1 answer

You can order data, get the top 5 records (using your head), and then apply the average value:

 aggregate(x$a, by=list(x$b,x$c),FUN= function(x) mean(head(x[order(-x)], 5))) # Group.1 Group.2 x #1 G 3 22.4 

If you want to do this with a special function, I would do it like this:

 myfunc <- function(vec, n){ mean(head(vec[order(-vec)], n)) } aggregate(x$a, by=list(x$b,x$c),FUN= function(z) myfunc(z, 5)) # Group.1 Group.2 x #1 G 3 22.4 

I really prefer to use the formula style in aggregate , which will look like this (I also use with() to be able to refer to column names directly without using x$ each time):

 with(x, aggregate(a ~ b + c, FUN= function(z) myfunc(z, 5))) # bca #1 G 3 22.4 

In this function, the parameter z is passed to each a -vector based on the groups b and c . Does that make more sense now? Also note that this does not return an integer, but a numeric (decimal, 22.4 in this case) value.

+6
source

Source: https://habr.com/ru/post/974168/


All Articles