Find adjacent lines matching the condition

I have a financial time series in R (currently an xts object, but I am also looking into tibet now).

How to find the probability of 2 adjacent rows matching a condition?

For example, I want to know the likelihood that 2 consecutive days will have a higher value than the average / median value. I know that I can lag value of the previous days to the next line, which would allow me to get these statistics, but it seems very cumbersome and inflexible.

Is there a better way to do this?

xts:

 foo <- xts(x = c(1,1,5,1,5,5,1), seq(as.Date("2016-01-01"), length = 7, by = "days")) 

What is the probability that 2 consecutive days have a value above median ?

+5
source share
2 answers

You can create a new column that is called, which is above the median, and then accept only those that are sequential and higher

 > foo <- as_tibble(data.table(x = c(1,1,5,1,5,5,1), seq(as.Date("2016-01-01"), length = 7, by = "days"))) 

Step 1

Create a column to find those above the median

 > foo$higher_than_median <- foo$x > median(foo$x) 

Step 2

Compare this column with diff ,

Take it only when both are successively higher or lower .. c(0, diff(foo$higher_than_median) == 0

Then add the condition that they must be higher foo$higher_than_median == TRUE

Full expression:

 foo$both_higher <- c(0, diff(foo$higher_than_median)) == 0 & $higher_than_median == TRUE 

Step 3

To find the probability, take the average foo$both_higher

 mean(foo$both_higher) [1] 0.1428571 
+2
source

Here is a clean xts solution.

How do you define the median? There are several ways.

In an online series, for example, calculating a moving average, you can calculate the median by a fixed back window (shown below) or from the source to date (calculation of a tied window). You will not know the future values ​​in median calculation outside the current time step (avoid dodging expectations) .:

 library(xts) library(TTR) x <- rep(c(1,1,5,1,5,5,1, 5, 5, 5), 10) y <- xts(x = x, seq(as.Date("2016-01-01"), length = length(x), by = "days"), dimnames = list(NULL, "x")) # Avoid look ahead bias in an online time series application by computing the median over a rolling fixed time window: nMedLookback <- 5 y$med <- runPercentRank(y[, "x"], n = nMedLookback) y$isAboveMed <- y$med > 0.5 nSum <- 2 y$runSum2 <- runSum(y$isAboveMed, n = nSum) z <- na.omit(y) prob <- sum(z[,"runSum2"] >= nSum) / NROW(z) 

The case where your median is across the entire dataset is obviously a much easier modification of this.

+2
source

Source: https://habr.com/ru/post/1273604/


All Articles