Summary statistics of several data frames in a list

Question

Summary statistics of several data frames in a list

If I have this list

set.seed(123) thelist <- list(a=data.frame(x1=rnorm(10), x2=rnorm(10)), b=data.frame(x1=rnorm(10), x2=rnorm(10)), c=data.frame(x1=rnorm(10), x2=rnorm(10)))

And I wanted to calculate the average value of each column in each list, I could do it with the following code.

 sapply(do.call("rbind",thelist),mean)

How can I calculate the standard deviation, again for each column in each list (a: c), since there is no equivalent function for sd (at least to my knowledge)?

Any suggestions would be appreciated.

+6

function list r

B. Davis May 07 '15 at 10:58 p.m.

source share

3 answers

First, I would make it stackable by turning the name into a column:

 for (i in seq_along(thelist)) thelist[[i]]$dfname <- names(thelist)[i]

Then, add and take using data.table :

 require(data.table) DT <- rbindlist(thelist) DT[,lapply(.SD,mean),by=dfname]

which gives

  dfname x1 x2 1: a 0.074625644 0.2086220 2: b -0.424558873 0.3220446 3: c -0.008715537 0.2216860

You can also consider the summary function, although it is awkward here:

 DT[,as.list(unlist(lapply(.SD,summary))),by=dfname] # dfname x1.Min. x1.1st Qu. x1.Median x1.Mean x1.3rd Qu. x1.Max. x2.Min. x2.1st Qu. x2.Median x2.Mean x2.3rd Qu. x2.Max. # 1: a -1.265 -0.5318 -0.07983 0.074630 0.37800 1.715 -1.9670 -0.32690 0.3803 0.2086 0.6505 1.7870 # 2: b -1.687 -1.0570 -0.67700 -0.424600 0.06054 1.254 -0.3805 -0.23680 0.4902 0.3220 0.7883 0.8951 # 3: c -1.265 -0.6377 -0.30540 -0.008716 0.56410 2.169 -1.5490 -0.03929 0.1699 0.2217 0.5018 1.5160

Finally, by copying my old answer , you can create your own summary statistics function:

 summaryfun <- function(x) list(mean=mean(x),sd=sd(x)) DT[,as.list(unlist(lapply(.SD,summaryfun))),by=dfname] # dfname x1.mean x1.sd x2.mean x2.sd # 1: a 0.074625644 0.9537841 0.2086220 1.0380734 # 2: b -0.424558873 0.9308092 0.3220446 0.5273024 # 3: c -0.008715537 1.0825182 0.2216860 0.8564451

+4

Frank May 07, '15 at 23:13

source share

You can combine your data as you yourself suggested, and then aggregate as follows:

 thelist_named <- Map(cbind, thelist, nam = names(thelist)) thelist_binded <- do.call(rbind, thelist_named)

Aggregation part:

 my_summary <- function(x){ c(mean = mean(x), sd = sd(x)) } aggregate(.~nam, thelist_binded, my_summary)

Result:

  nam x1.mean x1.sd x2.mean x2.sd 1 a 0.074625644 0.953784051 0.2086220 1.0380734 2 b -0.424558873 0.930809213 0.3220446 0.5273024 3 c -0.008715537 1.082518163 0.2216860 0.8564451

+3

Rentrop May 07 '15 at 23:39

source share

Rentrop · Accepted Answer · 2015-05-07T23:26:04+0000

The main R solution will use sapply twice.

For the average value:

 t(sapply(thelist, sapply, mean))

Result

  x1 x2 a 0.074625644 0.2086220 b -0.424558873 0.3220446 c -0.008715537 0.2216860

If you want to:

 my_summary <- function(x){ c(mean = mean(x), sd = sd(x)) } as.data.frame(lapply(thelist, sapply, my_summary))

Result:

  a.x1 a.x2 b.x1 b.x2 c.x1 c.x2 mean 0.07462564 0.208622 -0.4245589 0.3220446 -0.008715537 0.2216860 sd 0.95378405 1.038073 0.9308092 0.5273024 1.082518163 0.8564451

Summary statistics of several data frames in a list

More articles: