Is there a better solution than a for loop when you need to keep track of your current balance?

I have a large data array with millions of rows. This is time series data. For instance:

dates <- c(1,2,3) purchase_price <- c(5,2,1) income <- c(2,2,2) df <- data.frame(dates=dates,price=purchase_price,income=income) 

I want to create a new column that will tell me how much I spent on each day, with some rule like "if I have enough money, then buy it. Otherwise save the money."

Currently, I loop through each row of data and track the current amount of money. However, this requires forever a large data set. As far as I can tell, I cannot perform a vector operation because I need to track this current variable.

Inside the for loop, I do:

 balance = balance + row$income buy_amt = min(balance,row$price) balance = balance - buy_amt 

Is there a faster solution?

Thanks!

+6
source share
2 answers

As Paul points out, iteration is needed. You have a dependency between one instance and the previous point.

However, a dependency only occurs whenever a purchase is made (read: you only need to recalculate the balance when ...). This way you can iterate in batches

Try the following to determine which next line contains enough balance to make a purchase. Then it processes all the previous lines in one call and then goes to that point.

 library(data.table) DT <- as.data.table(df) ## Initial Balance b.init <- 2 setattr(DT, "Starting Balance", b.init) ## Raw balance for the day, regardless of purchase DT[, balance := b.init + cumsum(income)] DT[, buying := FALSE] ## Set N, to not have to call nrow(DT) several times N <- nrow(DT) ## Initialize ind <- seq(1:N) # Identify where the next purchase is while(length(buys <- DT[ind, ind[which(price <= balance)]]) && min(ind) < N) { next.buy <- buys[[1L]] # only grab the first one if (next.buy > ind[[1L]]) { not.buys <- ind[1L]:(next.buy-1L) DT[not.buys, buying := FALSE] } DT[next.buy, `:=`(buying = TRUE , balance = (balance - price) ) ] # If there are still subsequent rows after 'next.buy', recalculate the balance ind <- (next.buy+1) : N # if (N > ind[[1]]) { ## So that DT[ind, balance := cumsum(income) + DT[["balance"]][[ ind[[1]]-1L]] ] # } } # Final row needs to be outside of while-loop, or else will buy that same item multiple times if (DT[N, !buying && (balance > price)]) DT[N, `:=`(buying = TRUE, balance = (balance - price)) ] 

Results:

 ## Show output { print(DT) cat("Starting Balance was", attr(DT, "Starting Balance"), "\n") } ## Starting with 3: dates price income balance buying 1: 1 5 2 0 TRUE 2: 2 2 2 0 TRUE 3: 3 3 2 2 FALSE 4: 4 5 2 4 FALSE 5: 5 2 2 4 TRUE 6: 6 1 2 5 TRUE Starting Balance was 3 ## Starting with 2: dates price income balance buying 1: 1 5 2 4 FALSE 2: 2 2 2 4 TRUE 3: 3 3 2 3 TRUE 4: 4 5 2 0 TRUE 5: 5 2 2 0 TRUE 6: 6 1 2 1 TRUE Starting Balance was 2 # I modified your original data slightly, for testing df <- rbind(df, df) df$dates <- seq_along(df$dates) df[["price"]][[3]] <- 3 
+5
source

For tasks that are easily expressed in terms of loops, I am increasingly convinced that Rcpp is the right solution. This is relatively easy to pick up , and you can express the -y algorithms very naturally.

Here is a solution to your problem using Rcpp:

 #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] List purchaseWhenPossible(NumericVector date, NumericVector income, NumericVector price, double init_balance = 0) { int n = date.length(); NumericVector balance(n); LogicalVector buy(n); for (int i = 0; i < n; ++i) { balance[i] = ((i == 0) ? init_balance : balance[i - 1]) + income; // Buy it if you can afford it if (balance[i] >= price[i]) { buy[i] = true; balance[i] -= price[i]; } else { buy[i] = false; } } return List::create(_["buy"] = buy, _["balance"] = balance); } /*** R # Copying input data from Ricardo df <- data.frame( dates = 1:6, income = rep(2, 6), price = c(5, 2, 3, 5, 2, 1) ) out <- purchaseWhenPossible(df$dates, df$income, df$price, 3) df$balance <- out$balance df$buy <- out$buy */ 

To run it, save it to a file called purchase.cpp , then run Rcpp::sourceCpp("purchase.cpp")

It will be very fast because C ++ is so fast, but I have not done any official testing.

+4
source

Source: https://habr.com/ru/post/956827/


All Articles