Split data into a data list, but how to re-merge?

I have a large ol data frame with two columns of ID for courses and users, and I needed to split it into one data frame per course for further analysis / subset. After eliminating several rows from each individual course data frame, I will need to hide them together.

I split it using, you guessed it, split , and it worked exactly as I needed. However, the trip was harder than I thought. The R documentation says that " unsplit cancels the split effect", but my reading on the Internet still suggests that this is not the case when the mailing list items themselves are data files.

What can I do to join my modified dfs?

+4
source share
4 answers

This is the place for do.call . Just calling df <- rbind(split.df) will result in a strange and useless list object, but do.call("rbind", split.df) should give you the result you are looking for.

+12
source

unsplit() will work /, it seems to work in the general situation that you are describing, but not in the specific situation of deleting rows from a data frame so divided.

Consider

 > spl <- split(mtcars, mtcars$cyl) > str(spl, max = 1) List of 3 $ 4:'data.frame': 11 obs. of 11 variables: $ 6:'data.frame': 7 obs. of 11 variables: $ 8:'data.frame': 14 obs. of 11 variables: > str(unsplit(spl, f = mtcars$cyl)) 'data.frame': 32 obs. of 11 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... $ disp: num 160 160 108 258 360 ... $ hp : num 110 110 93 110 175 105 245 62 95 123 ... $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ qsec: num 16.5 17 18.6 19.4 17 ... $ vs : num 0 0 1 1 0 1 0 1 1 1 ... $ am : num 1 1 1 0 0 0 0 0 0 0 ... $ gear: num 4 4 4 3 3 3 3 4 4 4 ... $ carb: num 4 4 1 1 2 1 4 2 2 4 ... 

As we can see, unsplit() can cancel the split. However, in the case where a torn data frame is further processed and changed to delete rows, there will be a mismatch between the total number of rows in the data frames in the split list and the variable used to split the original data frame.

If you know or can calculate the changes needed to create the variable used to split the original data frame, then you can expand unsplit() . Although it is more than likely that this will not be trivial.

The general solution is that @Andrew Sannier mentions the idiom do.call(rbind, ...) :

 > spl <- split(mtcars, mtcars$cyl) > str(do.call(rbind, spl)) 'data.frame': 32 obs. of 11 variables: $ mpg : num 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26 30.4 ... $ cyl : num 4 4 4 4 4 4 4 4 4 4 ... $ disp: num 108 146.7 140.8 78.7 75.7 ... $ hp : num 93 62 95 66 52 65 97 66 91 113 ... $ drat: num 3.85 3.69 3.92 4.08 4.93 4.22 3.7 4.08 4.43 3.77 ... $ wt : num 2.32 3.19 3.15 2.2 1.61 ... $ qsec: num 18.6 20 22.9 19.5 18.5 ... $ vs : num 1 1 1 1 1 1 1 1 0 1 ... $ am : num 1 0 0 1 1 1 0 1 1 1 ... $ gear: num 4 4 4 4 4 4 3 4 5 5 ... $ carb: num 1 2 2 1 2 1 1 1 2 2 ... 
+5
source

Andrew Sannier's answer works, but has a side effect that changes the names of the rosers. rbind adds list names to them, for example, "Datsun 710" becomes "4.Datsun 710". To avoid this problem, you can use unname .

Full example:

 mtcars_reorder = mtcars[order(mtcars$cyl), ] #reorder based on cyl first l1 = split(mtcars_reorder, mtcars_reorder$cyl) #split by cyl l1 = unname(l1) #remove list names l2 = do.call(what = "rbind", l1) #unsplit all(l2 == mtcars_reorder) #check if matches #> TRUE 
+1
source

Outside of R base, also consider:

  • data.table::rbindlist() with a side effect of the result: data.table
  • dplyr::bind_rows() , which, despite its somewhat confusing name, will bind strings in lists
+1
source

Source: https://habr.com/ru/post/1444647/


All Articles