Create an effective weekly weekly calculation with a subset

Question

Create an effective weekly weekly calculation with a subset

In my working dataset, I am trying to calculate weekly values for changes in wholesale and revenue. The code seems to work, but according to my estimates, it will take about 75 hours to run what seems like a simple calculation. The following is a general reproducible version that takes about 2 m to work with this smaller dataset:

######################################################################################################################## # MAKE A GENERIC REPORDUCIBLE Qaru QUESTION ######################################################################################################################## # Create empty data frame of 26,000 observations similar to my data, but populated with noise exampleData <- data.frame(product = rep(LETTERS,1000), wholesale = rnorm(1000*26), revenue = rnorm(1000*26)) # create a week_ending column which increases by one week with every set of 26 "products" for(i in 1:nrow(exampleData)){ exampleData$week_ending[i] <- as.Date("2016-09-04")+7*floor((i-1)/26) } exampleData$week_ending <- as.Date(exampleData$week_ending, origin = "1970-01-01") # create empty columns to fill exampleData$wholesale_wow <- NA exampleData$revenue_wow <- NA # loop through the wholesale and revenue numbers and append the week-over-week changes for(i in 1:nrow(exampleData)){ # set a condition where the loop only appends the week-over-week values if it not the first week if(exampleData$week_ending[i]!="2016-09-04"){ # set temporary values for the current and past week wholesale value currentWholesale <- exampleData$wholesale[i] lastWeekWholesale <- exampleData$wholesale[which(exampleData$product==exampleData$product[i] & exampleData$week_ending==exampleData$week_ending[i]-7)] exampleData$wholesale_wow[i] <- currentWholesale/lastWeekWholesale -1 # set temporary values for the current and past week revenue currentRevenue <- exampleData$revenue[i] lastWeekRevenue <- exampleData$revenue[which(exampleData$product==exampleData$product[i] & exampleData$week_ending==exampleData$week_ending[i]-7)] exampleData$revenue_wow[i] <- currentRevenue/lastWeekRevenue -1 } }

Any help understanding why this is taking so long or how to shorten the time would be greatly appreciated!

+5

performance loops r subset

Will wright Sep 27 '17 at 18:51

source share

2 answers

Here is a vector solution using the tidyr package.

 set.seed(123) # Create empty data frame of 26,000 observations similar to my data, but populated with noise exampleData <- data.frame(product = rep(LETTERS,1000), wholesale = rnorm(1000*26), revenue = rnorm(1000*26)) # create a week_ending column which increases by one week with every set of 26 "products" #vectorize the creating of the data i<-1:nrow(exampleData) exampleData$week_ending <- as.Date("2016-09-04")+7*floor((i-1)/26) exampleData$week_ending <- as.Date(exampleData$week_ending, origin = "1970-01-01") # create empty columns to fill exampleData$wholesale_wow <- NA exampleData$revenue_wow <- NA #find the index of rows of interest (ie removing the first week) i<-i[exampleData$week_ending!="2016-09-04"] library(tidyr) #create temp variables and convert into wide format # the rows are product and the columns are the ending weeks Wholesale<-exampleData[ ,c(1,2,4)] Wholesale<-spread(Wholesale, week_ending, wholesale) Revenue<-exampleData[ ,c(1,3,4)] Revenue<-spread(Revenue, week_ending, revenue) #number of columns numCol<-ncol(Wholesale) #remove the first two columns for current wholesale #remove the first and last column for last week wholesale #perform calculation on ever element in dataframe (divide this week/lastweek) Wholesale_wow<- Wholesale[ ,-c(1, 2)]/Wholesale[ ,-c(1, numCol)] - 1 #convert back to long format Wholesale_wow<-gather(Wholesale_wow) #repeat for revenue Revenue_wow<- Revenue[ ,-c(1, 2)]/Revenue[ ,-c(1, numCol)] - 1 #convert back to long format Revenue_wow<-gather(Revenue_wow) #assemble calculated values back into the original dataframe exampleData$wholesale_wow[i]<-Wholesale_wow$value exampleData$revenue_wow[i]<-Revenue_wow$value

The strategy was to convert the source data to a wide format, where the rows were the product identifier and the columns were weeks. Then split the data frames into each other. Convert back to a long format and add the newly computed values to the exampleData data frame. This works, not very clean, but much faster than a loop. The dplyr package is another tool for this kind of work.

To compare the results of this code with you, use a test case:

 print(identical(goldendata, exampleData))

If goldendata is your good results, be sure to use the same random numbers with the set.seed () function.

+1

Dave2e Sep 27 '17 at 20:25

source share

manotheshark · Accepted Answer · 2017-09-27T19:59:42+0000

The first for loop can be simplified with the following:

 exampleData$week_ending2 <- as.Date("2016-09-04") + 7 * floor((seq_len(nrow(exampleData)) - 1) / 26) setequal(exampleData$week_ending, exampleData$week_ending2) [1] TRUE

Replacing the second for loop

 library(data.table) dt1 <- as.data.table(exampleData) dt1[, wholesale_wow := wholesale / shift(wholesale) - 1 , by = product] dt1[, revenue_wow := revenue / shift(revenue) - 1 , by = product] setequal(exampleData, dt1) [1] TRUE

It takes about 4 milliseconds to work on my laptop

Create an effective weekly weekly calculation with a subset

More articles: