Calculating multiple aggregates using lapply (.SD, ...) in data.table R package

Question

Calculating multiple aggregates using lapply (.SD, ...) in data.table R package

I would like to perform several aggregations using the data.table method lapply(.SD, ...), but my assumptions on how to do this in case of errors or equivalents rbind, not cbind.

For example, to get the average and average mpg in mtcars by cyl, you can do the following:

mtcars.dt <- data.table(mtcars)
mtcars.dt[, list(mpg.mean=mean(mpg), mpg.median=median(mpg)), by="cyl"]
# Result:
    cyl mpg.mean mpg.median
|1:   6    19.74       19.7
|2:   4    26.66       26.0
|3:   8    15.10       15.2

But applying the approach .SDeither performs the functions of:

mtcars.dt[, lapply(.SD, function(x) list(mean(x), median(x))),
          by="cyl", .SDcols=c("mpg")]
# Result:
   cyl              mpg
1:   6 19.7428571428571
2:   6             19.7
3:   4 26.6636363636364
4:   4               26
5:   8             15.1
6:   8             15.2

Or even breaks down:

mtcars.dt[, lapply(.SD, list(mean, median)),
          by="cyl", .SDcols=c("mpg")]
# Result:
Error in `[.data.table`(mtcars.dt, , lapply(.SD, list(mean, median)),  :
  attempt to apply non-function

EDIT: As Senor O noted, some answers provided work for my example, but only because there is one aggregation column. An ideal solution would work for multiple columns, for example, replacing the following:

mtcars.dt[, list(mpg.mean=mean(mpg), mpg.median=median(mpg), 
                 hp.mean=mean(hp), hp.median=median(hp)), by="cyl"]
# Result:
   cyl mpg.mean mpg.median hp.mean hp.median
1:   6    19.74       19.7  122.29     110.0
2:   4    26.66       26.0   82.64      91.0
3:   8    15.10       15.2  209.21     192.5

, , . , - , , .SDcols AFAIK.

+4

r data.table

Max Ghenis 10 . '14 22:03

2

"ask":) , lapply s:

mtcars.dt[, list(mpg.mean=lapply(.SD, mean), mpg.median=lapply(.SD, median)), 
          by="cyl", .SDcols=c("mpg")]
# Solution:
    cyl mpg.mean mpg.median
|1:   6    19.74       19.7
|2:   4    26.66       26.0
|3:   8    15.10       15.2

+3

Max Ghenis 10 . '14 22:07

eddi · Accepted Answer · 2014-06-10T22:23:33+0000

[[1]] $mpg:

mtcars.dt[, lapply(.SD, function(x) list(mean(x), median(x)))[[1]],
            by="cyl", .SDcols=c("mpg")]
#or
mtcars.dt[, lapply(.SD, function(x) list(mean(x), median(x)))$mpg,
            by="cyl", .SDcols=c("mpg")]
#   cyl       V1   V2
#1:   6 19.74286 19.7
#2:   4 26.66364 26.0
#3:   8 15.10000 15.2

:

mtcars.dt[, as.list(unlist(lapply(.SD, function(x) list(mean=mean(x),
                                                        median=median(x))))),
            by="cyl", .SDcols=c("mpg", "hp")]
#    cyl mpg.mean mpg.median hp.mean hp.median
# 1:   6    19.74       19.7  122.29     110.0
# 2:   4    26.66       26.0   82.64      91.0
# 3:   8    15.10       15.2  209.21     192.5

( as.list(sapply(.SD, ...)))

Calculating multiple aggregates using lapply (.SD, ...) in data.table R package

More articles: