How to create a conditional layout in R?

I have a data diagram of time series data with daily temperature observations. I need to create a dummy variable that counts every day with temperatures above the 5C threshold. This would be easy in itself, but there is an additional condition: the calculation starts only after ten consecutive days above the threshold. Here's an example frame:

df <- data.frame(date = seq(365), temp = -30 + 0.65*seq(365) - 0.0018*seq(365)^2 + rnorm(365)) 

I think I did it, but with too many loops to my liking. This is what I did:

 df$dummyUnconditional <- 0 df$dummyHead <- 0 df$dummyTail <- 0 for(i in 1:nrow(df)){ if(df$temp[i] > 5){ df$dummyUnconditional[i] <- 1 } } for(i in 1:(nrow(df)-9)){ if(sum(df$dummyUnconditional[i:(i+9)]) == 10){ df$dummyHead[i] <- 1 } } for(i in 9:nrow(df)){ if(sum(df$dummyUnconditional[(i-9):i]) == 10){ df$dummyTail[i] <- 1 } } df$dummyConditional <- ifelse(df$dummyHead == 1 | df$dummyTail == 1, 1, 0) 

Can anyone suggest simpler ways to do this?

+5
source share
3 answers

Here's the basic R option using rle :

 df$dummy <- with(rle(df$temp > 5), rep(as.integer(values & lengths >= 10), lengths)) 

Some explanation: The task is a classic use case for the length encoding ( rle ) function, imo. First, check if temp 5 is greater (creating a logical vector) and apply rle to this vector, as a result we get:

 > rle(df$temp > 5) #Run Length Encoding # lengths: int [1:7] 66 1 1 225 2 1 69 # values : logi [1:7] FALSE TRUE FALSE TRUE FALSE TRUE ... 

Now we want to find those cases where values are TRUE (i.e., the tempo is greater than 5) and where at the same time lengths greater than 10 (i.e. at least ten consecutive temp values ​​are greater than 5). We do this by running:

 values & lengths >= 10 

And finally, since we want to return a vector of the same length as nrow(df) , we use rep(..., lengths) and as.integer to return 1/0 instead of TRUE / FALSE .

+5
source

I think you could use a combination of simple ifelse and roll roll functions in zoo package to achieve what you are looking for. The final step simply involves filling out the result to account for the first N-1 days when there is not enough information to fill the window.

 library(zoo) df <- data.frame(date = seq(365), temp = -30 + 0.65*seq(365) - 0.0018*seq(365)^2 + rnorm(365)) df$above5 <- ifelse(df$temp > 5, 1, 0) temp <- rollapply(df$above5, 10, sum) df$conseq <- c(rep(0, 9),temp) 
+5
source

I would do this:

 set.seed(42) df <- data.frame(date = seq(365), temp = -30 + 0.65*seq(365) - 0.0018*seq(365)^2 + rnorm(365)) thr <- 5 df$dum <- 0 #find first 10 consecutive values above threshold test1 <- filter(df$temp > thr, rep(1,10), sides = 1) == 10L test1[1:9] <- FALSE n <- which(cumsum(test1) == 1L) #count days above threshold after that df$dum[(n+1):nrow(df)] <- cumsum(df$temp[(n+1):nrow(df)] > thr) 
+2
source

Source: https://habr.com/ru/post/1242002/


All Articles