What is the best method for determining the number of intraday volumes in bytes within the market using XTS / ZOO, etc. In R?

For example, let's say you have ~ 10 years of daily 1-minute data for tool volume x, as follows (in xts format) from 9:30 to 16:30:

  Date.Time Volume 2001-01-01 09:30:00 1200 2001-01-01 09:31:00 1110 2001-01-01 09:32:00 1303 

To end:

  2010-12-20 16:28:00 3200 2010-12-20 16:29:00 4210 2010-12-20 16:30:00 8303 

I would like to:

  • Get the average volume per minute for the entire series (i.e. the average volume for all 10 years at 9:30, 9:31, 9:32 ... 16:28, 16:29, 16:30)

How am I best to go:

  • Aggregate data in one minute.
  • Getting the average of these buckets
  • Restore these β€œmiddle” buckets to one xts / zoo time series?

I have a good game with the functions aggregate , sapply , period.apply , etc., but just can not correctly display the data.

This is fairly easy to solve with a loop, but very slow. I would prefer to avoid a software solution and use a function that takes advantage of the C ++ architecture (i.e. xts based xts )

Can anyone offer advice / solution?

Thanks for that in advance.

+6
source share
2 answers

First, create some test data:

 library(xts) # also pulls in zoo library(timeDate) library(chron) # includes times class # test data x <- xts(1:3, timeDate(c("2001-01-01 09:30:00", "2001-01-01 09:31:00", "2001-01-02 09:30:00"))) 

1) aggregate.zoo . Now try converting it to the times class and aggregate using this one-line interface:

 aggregate(as.zoo(x), times(format(time(x), "%H:%M:%S")), mean) 

1a) aggregate.zoo (change) . or this option, which converts shorter rows of aggregates to times , to avoid having to do this in a longer original series:

 ag <- aggregate(as.zoo(x), format(time(x), "%H:%M:%S"), mean) zoo(coredata(ag), times(time(ag))) 

2) click . An alternative would be tapply , which is most likely faster:

 ta <- tapply(coredata(x), format(time(x), "%H:%M:%S"), mean) zoo(unname(ta), times(names(ta))) 

EDIT: Simplified (1) and Added (1a) and (2)

+5
source

Here is a solution with ddply , but you can probably also use sqldf , tapply , aggregate , by , etc.

 # Sample data minutes <- 10 * 60 days <- 250 * 10 d <- seq.POSIXt( ISOdatetime( 2011,01,01,09,00,00, "UTC" ), by="1 min", length=minutes ) d <- outer( d, (1:days) * 24*3600, `+` ) d <- sort(d) library(xts) d <- xts( round(100*rlnorm(length(d))), d ) # Aggregate library(plyr) d <- data.frame( minute=format(index(d), "%H:%M"), value=coredata(d) ) d <- ddply( d, "minute", summarize, value=mean(value, na.rm=TRUE) ) # Convert to zoo or xts zoo(x=d$value, order.by=d$minute) # The index does not have to be a date or time xts(x=d$value, order.by=as.POSIXct(sprintf("2012-01-01 %s:00",d$minute), "%Y-%m-%d %H:%M:%S") ) 
+3
source

Source: https://habr.com/ru/post/909256/


All Articles