R: Effective subset of data based on time of day

I have a large (150,000x7) data frame that I intend to use for re-testing and analyzing the financial market in real time. The data represent the state of the vehicle at 5-minute intervals (although holes exist). It looks like this (but much longer):

pTime Time Price M1 M2 M3 M4 1 1212108300 20:45:00 1.5518 12.21849 -0.37125 4.50549 -31.00559 2 1212108900 20:55:00 1.5516 11.75350 -0.81792 -1.53846 -32.12291 3 1212109200 21:00:00 1.5512 10.75070 -1.47438 -8.24176 -34.35754 4 1212109500 21:05:00 1.5514 10.23529 -1.06044 -8.46154 -33.24022 5 1212109800 21:10:00 1.5514 9.74790 -1.02759 -10.21978 -33.24022 6 1212110100 21:15:00 1.5513 9.31092 -1.17076 -11.97802 -33.79888 7 1212110400 21:20:00 1.5512 8.84034 -1.28428 -13.62637 -34.35754 8 1212110700 21:25:00 1.5509 8.07843 -1.63715 -18.24176 -36.03352 9 1212111000 21:30:00 1.5509 7.39496 -1.49198 -20.65934 -36.03352 10 1212111300 21:35:00 1.5512 7.65266 -1.03717 -18.57143 -34.35754 

The data is preloaded into R, but during my back-test I need to multiply it by two criteria:

The first criteria is a sliding window, so as not to look into the future. The window should be such that every new 5-minute interval in the rear test shifts the entire window into the future by 5 minutes. This part I can do like this:

 require(zoo) zooser <- zoo(x=tser$Close, order.by=as.POSIXct(tser$pTime, origin="1970-01-01")) window(zooser, start=A, end=B) 

The second criterion is another sliding window, but which passes through time of day and contains only those records that are within N minutes of the input time on any day.

Example: if the window size is 2 hours and the input time is 12:00PM , then the window should contain all lines with Time between 10:00AM and 2:00PM

This is the part that is difficult for me to understand.

Edit: my data has holes in it, two consecutive lines can be MORE than 5 minutes apart. The data looks like this (very strong) enter image description here

When a window moves through these gaps, the number of dots inside the windows should change.

Below is my MySQL code that does what I want to do in R (same table structure):

 SET @qTime = Time(FROM_UNIXTIME(SAMP_endTime)); SET @inc = -1; INSERT INTO MetIndListBuys (pTime,ArrayPos,M1,M2,M3,M4) SELECT pTime,@inc: =@inc +1,M1,M2,M3,M4 FROM mergebuys USE INDEX (`y`) WHERE pTime BETWEEN SAMP_startTime AND SAMP_endTime AND TIME_TO_SEC(TIMEDIFF(Time,@qTime))/3600 BETWEEN 0-HourSpan AND HourSpan ; 
+4
source share
2 answers

Say you have a target time t0 on the same scale as pTime: seconds from an era. Then t0 - pTime = (the difference in the number of days from the era between the two) + (the difference in the remaining seconds). Taking t0 - pTime %% (number of seconds per day) will leave us with a difference in seconds in the arithmetic of the clock (wrapped if the difference is negative). This assumes the following function:

 SecondsPerDay <- 24 * 60 * 60 within <- function(d, t0Sec, wMin) { diff <- (d$pTime - t0Sec) %% SecondsPerDay wSec <- 60 * wMin return(d[diff < wSec | diff > (SecondsPerDay - wSec), ]) } 
+2
source

1) If DF is the data frame displayed in the question, then create a zoo object from it, as you did, and divide it into days, giving zs . Then lapply your function f for each consecutive set of w points in each component (i.e., every day). For example, if you want to apply your function to 2 hours of data at a time, and your data will be regularly posted on 5-minute data, then w = 24 (since there are 24 five-minute periods in two hours). In this case, f will be transmitted 24 rows of data in the form of a matrix each time it is called. In addition, align was set to "right" below, but it can be alternately set to align="center" , and the condition giving ix can be changed to two-sided, etc. For more on rollapply see ?rollapply

 library(zoo) z <- zoo(DF[-2], as.POSIXct(DF[,1], origin = "1970-01-01")) w <- 3 # replace this with 24 to handle two hours at a time with five min data f <- function(x) { tt <- x[, 1] ix <- tt[w] - tt <= w * 5 * 60 # RHS converts w to seconds x <- x[ix, -1] sum(x) # replace sum with your function } out <- rollapply(z, w, f, by.column = FALSE, align = "right") 

Using the data frame in the question, we get the following:

 > out $`2008-05-30` 2008-05-30 02:00:00 2008-05-30 02:05:00 2008-05-30 02:10:00 2008-05-30 02:15:00 -66.04703 -83.92148 -95.93558 -100.24924 2008-05-30 02:20:00 2008-05-30 02:25:00 2008-05-30 02:30:00 2008-05-30 02:35:00 -108.15038 -121.24519 -134.39873 -140.28436 

By the way, be sure to read this post .

2) . This can be done alternately as follows: w and f are as above:

 n <- nrow(DF) m <- as.matrix(DF[-2]) sapply(w:n, function(i) { m <- m[seq(length = w, to = i), ]; f(m) }) 

Replace sapply with lapply if necessary. In addition, this may seem shorter than the first solution, but it does not differ much after adding code to determine f and w (which appear in the first, but not the second).

If during the day there are no openings and only openings between days, then these solutions can be simplified.

+2
source

Source: https://habr.com/ru/post/1386947/


All Articles