Suppose I have two datasets. One contains a list of promotions with start and end dates, and the other contains monthly sales data for each program.
promotions = data.frame(
start.date = as.Date(c("2012-01-01", "2012-06-14", "2012-02-01", "2012-03-31", "2012-07-13")),
end.date = as.Date(c("2014-04-05", "2014-11-13", "2014-02-25", "2014-08-02", "2014-09-30")),
program = c("a", "a", "a", "b", "b"))
sales = data.frame(
year.month.day = as.Date(c("2013-02-01", "2014-09-01", "2013-08-01", "2013-04-01", "2012-11-01")),
program = c("a", "b", "a", "a", "b"),
monthly.sales = c(200, 200, 200, 400, 200))
Please note that is sales$year.month.dayused to indicate year / month. The day is on, so R can simply treat the column as a vector of date objects, but it has nothing to do with actual sales.
I need to determine the number of promotions that occurred per month for each program. Here is an example of a loop that produces the output I want:
sales$count = rep(0, nrow(sales))
sub = list()
for (i in 1:nrow(sales)) {
sub[[i]] = promotions[which(promotions$program == sales$program[i]),]
if (nrow(sub[[i]]) > 1) {
for (j in 1:nrow(sub[[i]])) {
if (sales$year.month.day[i] %in% seq(from = as.Date(sub[[i]]$start.date[j]), to = as.Date(sub[[i]]$end.date[j]), by = "day")) {
sales$count[i] = sales$count[i] + 1
}
}
}
}
Output Example:
sales = data.frame(
year.month.day = as.Date(c("2013-02-01", "2014-09-01", "2013-08-01", "2013-04-01", "2012-11-01")),
program = c("a", "b", "a", "a", "b"),
monthly.sales = c(200, 200, 200, 400, 200),
count = c(3, 1, 3, 3, 2)
)
However, since my actual datasets are very large, this loop crashes when I run it in R.
? , - dplyr?