Problems splitting a data frame into a nested list

I am new to R and I have a problem splitting a very large data frame into a nested list. I tried to find help on the Internet, but I was unsuccessful.

I have a simplified example of how my data is organized:

Headers:

1 "station" (number) 2. "date.str" (date string) 3. "member" 4. "forecast time" 5. "data" 

I'm not sure that my sample data will display correctly, but if so, it looks like this:

 1. station date.str member forecast.time data1 2. 6019 20110805 mbr000 06 77 3. 6031 20110805 mbr000 06 28 4. 6071 20110805 mbr000 06 45 5. 6019 20110805 mbr001 12 22 6. 6019 20110806 mbr024 18 66 

I want to split a large data frame into a nested list after "station", "member", "date.str" and "forecast.time". Thus, mylist [[c (s, m, d, t)]] contains a data frame with data for stations "s" and a member "m" for date.str "d" and for the predicted time "t", storing the values s, m, d and t.

My code is:

 data.st <- list() data.st.member <- list() data.st.member.dato <- list() data.st. <- split(mydata, mydata$station) data.st.member <- lapply(data.st, FUN = fsplit.member) 

(I created a function to split after the "member")

 #Loop over station number: for (s in 1:S){ #Loop over members: for (m in 1:length(members){ tmp <- split( data.st.member[[s]][[m]], data.st.member[[s]][[m]]$dato.str ) #Loop over number of different "date.str"s for (t in 1:length(no.date.str) ){ data.st.member.dato[[s]][[m]][[t]] <- tmp} } #end m loop } #end s loop 

I would also like to split according to the predicted time: forec.time, but I did not understand this.

I tried a couple of different configurations inside loops, so at the moment I don't have a consistent error message. I cannot understand what I am doing or thinking wrong.

Any help is much appreciated!

Sisse Relations

+1
source share
2 answers

It is easier than you think. You can pass a list to split to divide it into several factors.

Reproducible example

 with(airquality, split(airquality, list(Month, Day))) 

With your data

 data.st <- with(mydata, split(mydata, list("station", "member", "date.str", "forecast.time")) ) 

Note. This does not give you a nested list, as you requested, but, as Joran commented, you probably do not want this. A flat list will work nicer.

Speculation is wild: do you just want to calculate statistics for different pieces of data? If so, see the many questions here in split-apply-combine problems.

+1
source

I also want to repeat the rest in that it will be difficult to work with this recursive data structure, and there are probably better ways. Take a look at the split-apply-comb approach, as Richie suggested. However, the restrictions may be external, so here is the answer using the plyr library.

 mylist <- dlply(mydata, .(station), dlply, .(memeber), dlply, .(date.str), dlply, .(forecast.time), identity) 

Using the piece of data that you specified for mydata ,

 > mylist[[c("6019","mbr000","20110805","6")]] station date.str member forecast.time data1 1 6019 20110805 mbr000 6 77 
+1
source

Source: https://habr.com/ru/post/896806/


All Articles