Multiple Column Moving Averages - Grouped Data

Sorry if this was answered. Today I have done many examples, but cannot find what I'm trying to do.

I have a dataset that I need to calculate a 3 point moving average. I created some dummy data below:

set.seed(1234) data.frame(Week = rep(seq(1:5), 3), Section = c(rep("a", 5), rep("b", 5), rep("c", 5)), Qty = runif(15, min = 100, max = 500), To = runif(15, min = 40, max = 80)) 

I want to calculate the MA for each group based on the "Section" column for the "Qty" and "To" columns. Ideally, the output would be a data table. The moving average will start at Week 3, so there will be an average wks of 1: 3

I am trying to master the data.table package, so the solution using this will be great, but otherwise anyone would be very grateful.

For reference only, my actual dataset will be approx. 70 sections with c.1M rows. I found that data.table will be extremely fast when crunching these volumes so far.

+5
source share
3 answers

We could use rollmean from the zoo package in combination with data.table .

 library(data.table) library(zoo) setDT(df)[, c("Qty.mean","To.mean") := lapply(.SD, rollmean, k = 3, fill = NA, align = "right"), .SDcols = c("Qty","To"), by = Section] > df # Week Section Qty To Qty.mean To.mean #1: 1 a 145.4814 73.49183 NA NA #2: 2 a 348.9198 51.44893 NA NA #3: 3 a 343.7099 50.67283 279.3703 58.53786 #4: 4 a 349.3518 47.46891 347.3271 49.86356 #5: 5 a 444.3662 49.28904 379.1426 49.14359 #6: 1 b 356.1242 52.66450 NA NA #7: 2 b 103.7983 52.10773 NA NA #8: 3 b 193.0202 46.36184 217.6476 50.37802 #9: 4 b 366.4335 41.59984 221.0840 46.68980 #10: 5 b 305.7005 48.75198 288.3847 45.57122 #11: 1 c 377.4365 72.42394 NA NA #12: 2 c 317.9899 61.02790 NA NA #13: 3 c 213.0934 76.58633 302.8400 70.01272 #14: 4 c 469.3734 73.25380 333.4856 70.28934 #15: 5 c 216.9263 41.83081 299.7977 63.89031 
+5
source

Solution using dplyr:

 library(dplyr); library(zoo) myfun = function(x) rollmean(x, k = 3, fill = NA, align = "right") df %>% group_by(Section) %>% mutate_each(funs(myfun), Qty, To) #### Week Section Qty To #### (int) (fctr) (dbl) (dbl) #### 1 1 a NA NA #### 2 2 a NA NA #### 3 3 a 279.3703 58.53786 #### 4 4 a 347.3271 49.86356 
+1
source

There is currently a faster approach using the new frollmean function in data.table, which will be published soon in CRAN under the number 1.12.0.

 setDT(df)[, c("Qty.mean","To.mean") := frollmean(.SD, 3), .SDcols = c("Qty","To"), by = Section] 
0
source

Source: https://habr.com/ru/post/1245026/


All Articles