How to apply a function to a nested list?

I need to get the maximum of a variable in a nested list. For a specific station number "s" and a specific member "m", mylist[[s]][[m]] are of the form:

 station date.time member bias 6019 2011-08-06 12:00 mbr003 86 6019 2011-08-06 13:00 mbr003 34 

For each station I need to get the maximum bias all participants. For s = 3 I managed to do this:

 library(plyr) var1 <- mylist[[3]] var2 <- lapply(var1, `[`, 4) var3 <- laply(var2, .fun = max) max.value <- max(var3) 

Is there a way to avoid the column number "4" in the second row and use the variable name $bias in lapply or is the best way to do this?

+6
source share
3 answers

You can use [ with column names of data frames, as well as their index. So foo[4] will have the same result as foo["bias"] (assuming bias is the name of the fourth column).

$bias is not really the name of this column. $ is another function in R, for example [ , which is used to access columns of data frames (among other things).

But now I'm going to go on a limb and offer some recommendations on your data structure. If each element of your nested list contains data for a unique combination of station and member , here is a simplified version of the toys of your data:

 dat <- expand.grid(station = rep(1:3,each = 2),member = rep(1:3,each = 2)) dat$bias <- sample(50:100,36,replace = TRUE) tmp <- split(dat,dat$station) tmp <- lapply(tmp,function(x){split(x,x$member)}) > tmp $`1` $`1`$`1` station member bias 1 1 1 87 2 1 1 82 7 1 1 51 8 1 1 60 $`1`$`2` station member bias 13 1 2 64 14 1 2 100 19 1 2 68 20 1 2 74 etc. 

tmp is a list of length three, where each item itself is a list of length three. Each element is a data frame as shown above.

It is much easier to write data such as a single data frame. You will notice that I built it this way first ( dat ) and then split it twice. In this case, you can rbind put everything together using the following code:

 newDat <- do.call(rbind,lapply(tmp,function(x){do.call(rbind,x)})) rownames(newDat) <- NULL 

In this form, these types of calculations are much simpler:

 library(plyr) #Find the max bias for each unique station+member ddply(newDat,.(station,member),summarise, mx = max(bias)) station member mx 1 1 1 87 2 1 2 100 3 1 3 91 4 2 1 94 5 2 2 88 6 2 3 89 7 3 1 74 8 3 2 88 9 3 3 99 #Or maybe the max bias for each station across all members ddply(newDat,.(station),summarise, mx = max(bias)) station mx 1 1 100 2 2 94 3 3 99 
+6
source

Here is another lapply repetition lapply .

 lapply(tmp, function(x) lapply(lapply(x, '[[', 'bias'), max)) 
+2
source

You may need to use [[ instead of [ , but it should work well with the string (do not use $ ). try:

 var2 <- lapply( var1, '[', 'bias' ) 

or

 var2 <- lapply( var1, '[[', 'bias' ) 

depending on whether var1 is a list.

0
source

Source: https://habr.com/ru/post/896805/


All Articles