this question is a modification of the problem that I posted here , where I have occurrences of a certain type on different days, but this time they are assigned to several users, for example:
df = data.frame(user_id = c(rep(1:2, each=5)), cancelled_order = c(rep(c(0,1,1,0,0), 2)), order_date = as.Date(c('2015-01-28', '2015-01-31', '2015-02-08', '2015-02-23', '2015-03-23', '2015-01-25', '2015-01-28', '2015-02-06', '2015-02-21', '2015-03-26'))) user_id cancelled_order order_date 1 0 2015-01-28 1 1 2015-01-31 1 1 2015-02-08 1 0 2015-02-23 1 0 2015-03-23 2 0 2015-01-25 2 1 2015-01-28 2 1 2015-02-06 2 0 2015-02-21 2 0 2015-03-26
I would like to calculate
1) the number of canceled orders that each client will have during the next x days (for example, 7, 14) , excluding the current one , and p>
1) the number of canceled orders that each customer had in the past x days (for example, 7, 14) , excluding the current one .
The desired result will look like this:
solution user_id cancelled_order order_date plus14 minus14 1 0 2015-01-28 2 0 1 1 2015-01-31 1 0 1 1 2015-02-08 0 1 1 0 2015-02-23 0 0 1 0 2015-03-23 0 0 2 0 2015-01-25 2 0 2 1 2015-01-28 1 0 2 1 2015-02-06 0 1 2 0 2015-02-21 0 0 2 0 2015-03-26 0 0
A solution that is perfect for this purpose was provided by @ joel.wilson using data.table
library(data.table) vec <- c(14, 30) # Specify desired ranges setDT(df)[, paste0("x", vec) := lapply(vec, function(i) sum(df$cancelled_order[between(df$order_date, order_date, order_date + i, # this part can be changed to reflect the past date ranges incbounds = FALSE)])), by = order_date]
However, it does not account for the grouping on user_id . When I tried to change the formula by adding this group as by = c("user_id", "order_date") or by = list(user_id, order_date) , it did not work. It seems that this is something very basic, some hints on how to get around this detail?
Also, keep in mind that I'm working on a solution, even if it is not based on the above code or data.table at all!
Thanks!