Search for the last observation earlier than some timestamp with XTS

I have an xts object that looks like this:

 > q.xts val 2011-08-31 09:30:00.002357 -1.0135222 2011-08-31 09:30:00.003443 -0.2182679 2011-08-31 09:30:00.005075 -0.5317191 2011-08-31 09:30:00.009515 -1.0639535 2011-08-31 09:30:00.011569 -1.2470759 2011-08-31 09:30:00.012144 0.7678103 2011-08-31 09:30:00.023813 -0.6303432 2011-08-31 09:30:00.024107 -0.5105943 

I am calculating a fixed offset from timestamps in another data frame, r . The number of lines in r significantly less than the number of lines in q.xts .

 > r time predict.time 1 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.002443 2 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.008515 3 2011-08-31 09:30:00.024107 2011-08-31 09:30:00.023108 

The time column corresponds to the observation from q.xts , while the predict.time column is 1 millisecond earlier than time (except for any completed rounding of precision).

I would like to find the last observation from q.xts that is equal to or earlier than the time for each predict.time value. For the three observations in r above, I would expect the following matches:

  time predict.time (time from q.xts) 1 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.002443 --> 09:30:00.002357 2 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.008515 --> 09:30:00.005075 3 2011-08-31 09:30:00.024107 2011-08-31 09:30:00.023108 --> 09:30:00.012144 

I approached this by xts subset over each line in r and doing an xts subset . So, for line 1 of r I would do:

 > last(index(q.xts[paste('/', r[1,]$predict.time, sep='')])) [1] "2011-08-31 09:30:00.002357 CDT" 

QUESTION: Doing this with a loop seems awkward and inconvenient. Is there a better way? I would like to get another column in r that provides the exact time or row number for the corresponding value in q.xts .


NOTE. Use this to create the data that I used for this example:

 q <- read.csv(tc <- textConnection(" 2011-08-31 09:30:00.002358, -1.01352216 2011-08-31 09:30:00.003443, -0.21826793 2011-08-31 09:30:00.005076, -0.53171913 2011-08-31 09:30:00.009515, -1.06395353 2011-08-31 09:30:00.011570, -1.24707591 2011-08-31 09:30:00.012144, 0.76781028 2011-08-31 09:30:00.023814, -0.63034317 2011-08-31 09:30:00.024108, -0.51059425"), header=FALSE); close(tc) colnames(q) <- c('datetime', 'val') q.xts <- xts(q[-1], as.POSIXct(q$datetime)) r <- read.csv(tc <- textConnection(" 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.024108"), header=FALSE); close(tc) colnames(r) <- c('time') r$time <- as.POSIXct(strptime(r$time, '%Y-%m-%d %H:%M:%OS')) r$predict.time <- r$time - 0.001 
+4
source share
2 answers

There may be a better way to do this, but this is the best I can think of at the moment.

 # create an empty xts object based on r$predict.time r.xts <- xts(,r$predict.time) # merge q.xts and r.xts. This will insert NAs at the times in r.xts. tmp <- merge(q.xts,r.xts) # Here the magic: # lag tmp *backwards* one period, so the NAs appear at the times # right before the times in r.xts. Then grab the index for the NA periods tmp.index <- index(tmp[is.na(lag(tmp,-1,na.pad=FALSE))]) # get the rows in q.xts for the times in tmp.index out <- q.xts[tmp.index] # val # 2011-08-31 09:30:00.002357 -1.0135222 # 2011-08-31 09:30:00.005075 -0.5317191 # 2011-08-31 09:30:00.012144 0.7678103 
+3
source

I would use findInterval :

 findInterval(r$predict.time, index(q.xts)) > q.xts[findInterval(r$predict.time, index(q.xts)),] val 2011-08-31 09:30:00 -1.0135222 2011-08-31 09:30:00 -0.5317191 2011-08-31 09:30:00 0.7678103 

Your time is POSIXct , so it should be reliable enough.

+2
source

Source: https://habr.com/ru/post/1381947/


All Articles