Using the apply function for medium data groups

Question

Using the apply function for medium data groups

My info frame:

df<-data.frame(ID = rep(c("no","bo", "fo", "to"), each = 3), matrix(sample(60), ncol = 5))
names(df) <- c("ID", letters[1:5])

I calculated the average value for each observation group, and I used this:

 df.n.mean <- aggregate(. ~ ID, df, function(x) c(mean = mean(x)))

I would like to know if I can use the applicable approach instead of an aggregate. Will this speed up the process?

+4

r

Al14 Sep 08 '15 at 20:23

source share

2 answers

dplyr summarise_each aggregate:

library(dplyr)
newdf <- df %>% group_by(ID) %>% summarise_each(funs(mean))

      ID        a        b        c        d        e
  (fctr)    (dbl)    (dbl)    (dbl)    (dbl)    (dbl)
1     bo 48.66667 32.00000 22.66667 33.33333 33.33333
2     fo 19.33333 15.00000 36.66667 25.33333 23.00000
3     no 35.00000 22.33333 37.00000 20.66667 31.00000
4     to 41.33333 39.00000 20.33333 37.00000 37.00000

. , , .

+2

Cleb 08 . '15 20:38

Rich Scriven · Accepted Answer · 2015-09-08T20:58:00+0000

I think the fastest replacement aggregate()would be to use data.table

library(data.table)
( dt <- setDT(df)[, lapply(.SD, mean), by = ID] )
#    ID         a        b        c        d        e
# 1: no 25.000000 26.00000 24.66667 39.00000 39.66667
# 2: bo 40.666667 25.33333 31.33333 37.00000 19.33333
# 3: fo  5.333333 28.00000 53.33333 11.66667 29.33333
# 4: to 30.666667 47.33333 27.00000 41.33333 28.00000

To subtract lines, we could write a function and use it with Map().

f <- function(x, y) {
    dt[ID == x, -1, with = FALSE] - dt[ID == y, -1, with = FALSE]
}
rbindlist(Map(f, c("bo", "fo", "to", "to"), c("no", "no", "bo", "fo")))
#            a          b          c          d          e
# 1:  15.66667 -0.6666667   6.666667  -2.000000 -20.333333
# 2: -19.66667  2.0000000  28.666667 -27.333333 -10.333333
# 3: -10.00000 22.0000000  -4.333333   4.333333   8.666667
# 4:  25.33333 19.3333333 -26.333333  29.666667  -1.333333

, f() data.table, , . , - sample() .

. , .

A <- c("bo", "fo", "to", "to")
B <- c("no", "no", "bo", "fo")
df2 <- as.data.frame(rbindlist(Map(f, A, B)))
rownames(df2) <- paste(A, B, sep = "-")
df2
#               a          b          c          d          e
# bo-no  15.66667 -0.6666667   6.666667  -2.000000 -20.333333
# fo-no -19.66667  2.0000000  28.666667 -27.333333 -10.333333
# to-bo -10.00000 22.0000000  -4.333333   4.333333   8.666667
# to-fo  25.33333 19.3333333 -26.333333  29.666667  -1.333333

Using the apply function for medium data groups

More articles: