Amount for (date)

I am trying to calculate the sum of the values ​​(in XYZmin ) that occur on the same date.

My data is as follows:

 bar <- structure(list(date = structure(c(15622, 15622, 15622, 15628, 15632, 15635, 15639, 15639, 15639, 15639, 15639, 15642, 15646, 15646, 15650, 15650, 15650, 15657, 15660, 15660, 15674, 15681, 15691, 15695, 15709, 15716, 15723, 15730, 15737, 15737, 15737, 15737, 15737, 15737, 15740, 15743, 15743, 15743, 15744, 15744, 15744, 15744, 15746, 15751, 15755, 15758), class = "Date"), XYZmin = c(-20, -15, -10, -70, -60, -60, -95, -10, -10, -40, -25, -25, -20, -10, -3, -5, -25, -5, -70, -5, -30, -30, -25, 60, 60, 60, 60, 60, -10, -10, -30, -30, -10, -10, -10, -60, -30, -10, 75, -10, -10, -10, 60, 60, -15, 60)), .Names = c("date", "XYZmin"), class = "data.frame", row.names = c(NA, -46L)) head(bar) date XYZmin 1 2012-10-09 -20 2 2012-10-09 -15 3 2012-10-09 -10 4 2012-10-15 -70 5 2012-10-19 -60 6 2012-10-22 -60 

What I'm trying to accomplish is to create new XYZtot variables, where within the data that occur more than once, they sum the first and second value in the 2nd data and sum the 1st, 2nd and 3rd values on 3 data. That's what I aimed from.

 head(new_bar_with_XYZtot) date XYZmin XYZtot 1 2012-10-09 -20 -20 2 2012-10-09 -15 -35 3 2012-10-09 -10 -40 4 2012-10-15 -70 -70 5 2012-10-19 -60 -60 6 2012-10-22 -60 -60 

updated with big microbenchmark test

 alexwhan <- function(bar,date,XYZmin) ddply(bar, .(date), transform, XYZmin.sum = cumsum(XYZmin)) Arun <- function(bar,date,XYZmin) within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)}) agstudy <- function(bar,date,XYZmin) transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum)) # install.packages("data.table", dependencies = TRUE) library(data.table) mnel <- function(bar,date,XYZmin) bar <- data.table(bar); bar[, XYZmin.sum := cumsum(XYZmin), by = date] # install.packages("microbenchmark", dependencies = TRUE) require(microbenchmark) # run test res <- microbenchmark(alexwhan(bar,date,XYZmin), Arun(bar,date,XYZmin), agstudy(bar,date,XYZmin), mnel(bar,date,XYZmin), times = 666) ## Print results: print(res) 

numbers

 Unit: microseconds expr min lq median uq max neval alexwhan(bar, date, XYZmin) 14484.077 15056.613 15237.760 15945.482 72650.126 666 Arun(bar, date, XYZmin) 963.632 1018.311 1070.759 1138.655 4988.226 666 agstudy(bar, date, XYZmin) 1967.292 2021.115 2078.261 2158.689 9240.500 666 mnel(bar, date, XYZmin) 251.312 270.295 282.821 325.040 6540.367 666 ### Plot results: boxplot(res) 

Plot results

+4
source share
4 answers

Using ave as well, but with transform

 transform(bar, XYZtot = ave(XYZmin, date, FUN = cumsum)) 

EDIT after OP comment

 transform(bar, XYZtot = ave(XYZmin, date, FUN = function(x) if(length(x) < 1) NA else c(cumsum(x[-length(x)]),NA))) 
+3
source

If you are going to talk, I will put data.table solution

 library(data.table) bar <- data.table(bar) # assigning within bar bar[, XYZmin.sum := cumsum(XYZmin), by = date] 

It will scale for big data!

+5
source

Here ave used:

 bar <- within(bar, {XYZtot <- ave( XYZmin, date, FUN=cumsum)}) 
+3
source

That's what you need?

 bar.sum <- ddply(bar, .(date), transform, XYZmin.sum = cumsum(XYZmin)) bar.sum 
+1
source

Source: https://habr.com/ru/post/1468825/


All Articles