R using data.table to compute a column depending on previous rows

Question

R using data.table to compute a column depending on previous rows

I have several products with corresponding daily sales. I want to predict the expected daily sales of these products based on the current cumulative sales for each product and the total amount that I expect to sell over a period of time.

The first table (the "key") has the total sales expected for each product, as well as how much I plan to sell every day depending on how much has already been sold (that is, if my cumulative sales for product A 650, I sold 43% of the total 1,500 and therefore forecast to sell 75 the next day because 40% <43% <60%).

I want to update the cumulative sales of the second table (“data”) for each product based on projected sales. The forecasted volumes depend on cumulative sales for the previous period, that is, I can not calculate each column myself, and therefore I think I need to use cycles.

However, my database contains more than 500,000 rows, and my best attempt to use for loops is too slow to be doable. Thoughts? I think that implementing Rcpp might be a potential solution, but I have not used this package or C ++ before. The desired final answer is shown below ("final").

library(data.table)
key <- data.table(Product = c(rep("A",5), rep("B",5)), TotalSales = 
c(rep(1500,5),rep(750,5)), Percent = rep(seq(0.2, 1, 0.2),2), Forecast = 
c(seq(125, 25, -25), seq(75, 15, -15)))

data <- data.table(Date = rep(seq(1, 9, 1), 2), Product=rep(c("A", "B"), 
each=9L), Time = rep(c(rep("Past",4), rep("Future",5)),2), Sales = c(190, 
165, 133, 120, 0, 0, 0, 0, 0, 72, 58, 63, 51, 0, 0, 0, 0, 0))

final <- data.table(data, Cum = c(190, 355, 488, 608, 683, 758, 833, 908, 
958, 72, 130, 193, 244, 304, 349, 394, 439, 484), Percent.Actual = c(0.13, 
0.24, 0.33, 0.41, 0.46, 0.51, 0.56, 0.61, 0.64, 0.10, 0.17, 0.26, 0.33, 
0.41, 0.47, 0.53, 0.59, 0.65), Forecast = c(0, 0, 0, 0, 75, 75, 75, 75, 50, 
0, 0, 0, 0, 60, 45, 45, 45, 45))

+4

performance for-loop r data.table rcpp

Creg Apr 19 '18 at 21:06

source share

1 answer

chinsoon12 · Answer 1 · 2018-04-20T01:08:10+0000

Not sure if this will really help with your actual size-based dataset.

library(data.table)

#convert key into a list for fast loookup
keyLs <- lapply(split(key, by="Product"), 
    function(x) list(TotalSales=x[,TotalSales[1L]], 
                     Percent=x[,Percent], 
                     Forecast=x[,Forecast]))

#for each product, use recursion to calculate cumulative sales after finding the forecasted sales
futureSales <- data[, {
        byChar <- as.character(.BY)
        list(Date=Date[Time=="Future"], 
            Cum=Reduce(function(x, y) {
                pct <- x / keyLs[[byChar]]$TotalSales
                x + keyLs[[byChar]]$Forecast[findInterval(pct, c(0, keyLs[[byChar]]$Percent))]
            },
            x=rep(0L, sum(Time=="Future")),
            init=sum(Sales[Time=="Past"]),
            accumulate=TRUE)[-1])
    },
    by=.(Product)]
futureSales 

#calculate other sales stats
futureSales[data, on=.(Date, Product)][,
    Cum := ifelse(is.na(Cum), cumsum(Sales), Cum),
    by=.(Product)][,
        ':=' (
            Percent.Actual = Cum / keyLs[[as.character(.BY)]]$TotalSales,
            Forecast = ifelse(Sales > 0, 0, c(0, diff(Cum)))
        ), by=.(Product)][]
#     Product Date Cum   Time Sales Percent.Actual Forecast
#  1:       A    1 190   Past   190      0.1266667        0
#  2:       A    2 355   Past   165      0.2366667        0
#  3:       A    3 488   Past   133      0.3253333        0
#  4:       A    4 608   Past   120      0.4053333        0
#  5:       A    5 683 Future     0      0.4553333       75
#  6:       A    6 758 Future     0      0.5053333       75
#  7:       A    7 833 Future     0      0.5553333       75
#  8:       A    8 908 Future     0      0.6053333       75
#  9:       A    9 958 Future     0      0.6386667       50
# 10:       B    1  72   Past    72      0.0960000        0
# 11:       B    2 130   Past    58      0.1733333        0
# 12:       B    3 193   Past    63      0.2573333        0
# 13:       B    4 244   Past    51      0.3253333        0
# 14:       B    5 304 Future     0      0.4053333       60
# 15:       B    6 349 Future     0      0.4653333       45
# 16:       B    7 394 Future     0      0.5253333       45
# 17:       B    8 439 Future     0      0.5853333       45
# 18:       B    9 484 Future     0      0.6453333       45

, .

R using data.table to compute a column depending on previous rows

More articles: