Time Aggregation in Lubridate

Question

Time Aggregation in Lubridate

This question asks about time aggregation in R, which pandas causes re-sampling. The most useful answer uses the XTS package to group by a specific time period, using some functions, such as sum () or mean ().

One of the comments suggested something similar in lubridate, but did not specify. Can someone provide an idiomatic example using lubridate? I read the lubridate vignette a couple of times and can imagine some combination of lubridate and plyr, however I want to make sure that there is no easier way that I am missing.

To make the example more real, let's say I want the daily amount of bikes traveling north from this dataset:

library(lubridate) library(reshape2) bikecounts <- read.csv(url("http://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD"), header=TRUE, stringsAsFactors=FALSE) names(bikecounts) <- c("Date", "Northbound", "Southbound")

The data is as follows:

 > head(bikecounts) Date Northbound Southbound 1 10/02/2012 12:00:00 AM 0 0 2 10/02/2012 01:00:00 AM 0 0 3 10/02/2012 02:00:00 AM 0 0 4 10/02/2012 03:00:00 AM 0 0 5 10/02/2012 04:00:00 AM 0 0 6 10/02/2012 05:00:00 AM 0 0

+6

r lubridate

Peter Aug 4 '13 at 18:31

source share

4 answers

Here is an option using data.table after importing csv:

 library(data.table) # convert the data.frame to data.table bikecounts <- data.table(bikecounts) # Calculate bikecounts[, list(NB=sum(Northbound), SB=sum(Southbound)), by=as.Date(Date, format="%m/%d/%Y")] as.Date NB SB 1: 2012-10-02 1165 773 2: 2012-10-03 1761 1760 3: 2012-10-04 1767 1708 4: 2012-10-05 1590 1558 5: 2012-10-06 926 1080 --- 299: 2013-07-27 1212 1289 300: 2013-07-28 902 1078 301: 2013-07-29 2040 2048 302: 2013-07-30 2314 2226 303: 2013-07-31 2008 2076

Note. You can also use fread() ("fast read") from the data.table package to read in CSV into the data table in one step. The only rollback is that you manually convert the date / time from a string.

 eg: bikecounts <- fread("http://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD", header=TRUE, stringsAsFactors=FALSE) setnames(bikecounts, c("Date", "Northbound", "Southbound")) bikecounts[, Date := as.POSIXct(D, format="%m/%d/%Y %I:%M:%S %p")]

+2

Ricardo saporta Aug 4 '13 at 19:18

source share

Using ddply from the plyr package:

 library(plyr) bikecounts$Date<-with(bikecounts,as.Date(Date, format = "%m/%d/%Y")) x<-ddply(bikecounts,.(Date),summarise, sumnorth=sum(Northbound),sumsouth=sum(Southbound)) > head(x) Date sumnorth sumsouth 1 2012-10-02 1165 773 2 2012-10-03 1761 1760 3 2012-10-04 1767 1708 4 2012-10-05 1590 1558 5 2012-10-06 926 1080 6 2012-10-07 951 1191 > tail(x) Date sumnorth sumsouth 298 2013-07-26 1964 1999 299 2013-07-27 1212 1289 300 2013-07-28 902 1078 301 2013-07-29 2040 2048 302 2013-07-30 2314 2226 303 2013-07-31 2008 2076

+2

Metrics Aug 4 '13 at 19:21

source share

Here is the lubridate solution requested, which I also added to the related question. It uses a combination of lubridate and zoo aggregate () for these operations:

 ts.month.sum <- aggregate(zoo.ts, month, sum) ts.daily.mean <- aggregate(zoo.ts, day, mean) ts.mins.mean <- aggregate(zoo.ts, minutes, mean)

Obviously, you need to convert the data to a zoo () object first, which is quite simple. You can also use yearmon () or yearqtr () or custom functions to separate and apply. This method is syntactically sweeter than the pandas method.

+1

Adam Erickson 21 sept '15 at 21:25

source share

GSee · Accepted Answer · 2013-08-04T18:49:57+0000

I do not know why you are using lubridate for this. If you are just looking for something smaller than xts, you can try this

 tapply(bikecounts$Northbound, as.Date(bikecounts$Date, format="%m/%d/%Y"), sum)

Basically, you just need to split by date, and then apply the function.

lubridate can be used to create a grouping factor for split problems. So, for example, if you want to get the amount for each month (ignoring the year)

 tapply(bikecounts$Northbound, month(mdy_hms(bikecounts$Date)), sum)

But it just uses wrappers for the basic R functions, and in the case of OP, I believe that the basic function of R as.Date is the simplest (as evidenced by the fact that other answers also ignored your request to use lubridate ;-)).

Something that was not related to the Answer to another Question related to OP split.xts . period.apply splits xts into endpoints and applies a function to each group. You can find endpoints that are useful for a given task using the endpoints function. For example, if you have an xts, x object, then endpoints(x, "months") will give you the line numbers, which are the last line of each month. split.xts uses that to split an xts object - split(x, "months") will return a list of xts objects in which each component has been for another month.

Although split.xts() and endpoints() are primarily intended for xts objects, they also work with some other objects, including simple time-based vectors. Even if you don't want to use xts objects, you can still find a use for endpoints() because of its convenience or speed (implemented in C)

 > split.xts(as.Date("1970-01-01") + 1:10, "weeks") [[1]] [1] "1970-01-02" "1970-01-03" "1970-01-04" [[2]] [1] "1970-01-05" "1970-01-06" "1970-01-07" "1970-01-08" "1970-01-09" [6] "1970-01-10" "1970-01-11" > endpoints(as.Date("1970-01-01") + 1:10, "weeks") [1] 0 3 10

I think it is best to use lubridate in this task to parse Date strings in POSIXct objects. those. mdy_hms function in this case.

Here's the xts solution using lubridate to parse Date strings.

 x <- xts(bikecounts[, -1], mdy_hms(bikecounts$Date)) period.apply(x, endpoints(x, "days"), sum) apply.daily(x, sum) # identical to above

For this specific task, xts also has an optimized period.sum function (written in Fortran) that is very fast.

 period.sum(x, endpoints(x, "days"))

Time Aggregation in Lubridate

More articles: