Calculation of the size of the winning and losing streaks

I am trying to calculate the size of a winning and losing streak, and this question is a continuation of the previous question when I tried to calculate the length of the streak.

Here's what my data looks like:

> subRes Instrument TradeResult.Currency. 1 JPM -3 2 JPM 264 3 JPM 284 4 JPM 69 5 JPM 283 6 JPM -219 7 JPM -91 8 JPM 165 9 JPM -35 10 JPM -294 11 KFT -8 12 KFT -48 13 KFT 125 14 KFT -150 15 KFT -206 16 KFT 107 17 KFT 107 18 KFT 56 19 KFT -26 20 KFT 189 > dput(subRes) structure(list(Instrument = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("JPM", "KFT"), class = "factor"), TradeResult.Currency. = c(-3, 264, 284, 69, 283, -219, -91, 165, -35, -294, -8, -48, 125, -150, -206, 107, 107, 56, -26, 189)), .Names = c("Instrument", "TradeResult.Currency." ), class = "data.frame", row.names = c(NA, 20L)) 

My goal: I want to calculate the size of the longest winning and losing streaks for each instrument. Thus, for JPM these would be lines 2, 3, 4, and 5 from the above data, which gives the following TradeResult.Currency. values TradeResult.Currency. : 264 + 284 + 69 +283, total 900. The size of the longest losing streak for JPM will be a number 9 and 10, which will give a total result of -329 (-35 + -294). For KFT size of the longest winning streak is 270 (107 + 107 + 56, lines 16 to 18), and the size of the longest losing streak will be -356 (-150 + -206, lines 14 and 15).

The following function gives the correct size of the winning streak ...

 WinStreakSize <- function(x){ df.rle <- ifelse(x > 0, 1, 0) df.rle <- rle(df.rle) wh <- which(df.rle$lengths == max(df.rle$lengths)) mx <- df.rle$lengths[wh] suma <- df.rle$lengths[1:wh] out <- x[(sum(suma) - (suma[length(suma)] - 1)):sum(suma)] return(sum(out)) } 

.. as a result of:

 > with(subRes, tapply(TradeResult.Currency., Instrument, WinStreakSize) + ) JPM KFT 900 270 

However, I cannot describe this function to display the size of the longest losing streak (so that it outputs -329 for JPM and -356 for KFT) how stupid it sounds. I tried to change the function in various ways, split it and rebuild, and I can not find the reason for this.

Here's what I mean (output from debugging a function where x values ​​are JPM values ​​after subRes is subRes ):

 Browse[2]> ifelse(x > 0, 1, 0) [1] 0 1 1 1 1 0 0 1 0 0 Browse[2]> ifelse(x < 0, 1, 0) [1] 1 0 0 0 0 1 1 0 1 1 Browse[2]> rle( ifelse(x > 0, 1, 0)) Run Length Encoding lengths: int [1:5] 1 4 2 1 2 values : num [1:5] 0 1 0 1 0 Browse[2]> rle( ifelse(x < 0, 1, 0)) Run Length Encoding lengths: int [1:5] 1 4 2 1 2 values : num [1:5] 1 0 1 0 1 Browse[2]> inverse.rle( ifelse(x > 0, 1, 0)) Error in x$lengths : $ operator is invalid for atomic vectors Browse[2]> rle( !ifelse(x < 0, 1, 0)) Run Length Encoding lengths: int [1:5] 1 4 2 1 2 values : logi [1:5] FALSE TRUE FALSE TRUE FALSE 

Thus, changing conditions in this function does not affect the output of the function. This suggests that I am looking for the wrong part of the function to solve, but the ifelse statement is the first of the function. In other words, from line 1 onwards, the function uses the wrong input despite changing conditions.

What obvious point am I missing?

+4
source share
1 answer

rle(ifelse(x>0,1,0)) basically coincides with rle(ifelse(x<0,1,0)) or rle(x>0) or rle(x<0) , with the difference that the values for runs are different. But you never work with run values ​​in your function, so that doesn't matter. When you select by length, not by value, it is obvious that you will get the same result every time again.

Let me simplify things a bit. Using the main function, I demonstrate the calculation of both the execution length and the totals. Keep in mind that your decision in the question is not accurate: for JPM, there are two longest negative results. I decided to return only the one with the largest absolute value.

 MaxStreakSize <- function(x){ # Get the run lengths and values df.rle <- rle(x>0) ngroups <- length(df.rle$lengths) ll <- df.rle$lengths val <- df.rle$values # calculate the sums id <- rep(1:ngroups,ll) sums <- tapply(x,id,sum) # find the largest runs for positive (val) and negative (!val) rmax <- which(ll==max(ll[val]) & val ) rmin <- which(ll==max(ll[!val]) & !val ) out <- list( "Lose"=c("length"=max(ll[rmin]), "sum"=min(sums[rmin])), "Win"=c("length"=max(ll[rmax]), "sum"=max(sums[rmax])) ) return(out) } 

In such problems, it’s very useful to get some kind of index based on the number of groups and the length of the runs. It makes life so much easier. This allows me to calculate amounts, funds, etc. Using a simple tapply . After I have built three vectors of the same length ( ll , sums and val ), I can easily relate the length, value and sum of the runs and select whatever I want to exit.

The advantage of using rle (x> 0) is that you can use the values ​​as an index, which greatly simplifies the work.

+5
source

Source: https://habr.com/ru/post/1336283/


All Articles