How to choose unbreakable numbers?

I have two vectors:

x<-c(0,1,0,2,3,0,1,1,0,2) y<-c("00:01:00","00:02:00","00:03:00","00:04:00","00:05:00", "00:06:00","00:07:00","00:08:00","00:09:00","00:10:00") 

I need to select only those indicated in y , where the x values ​​are not interrupted by 0. As a result, I would like to get such a data frame

 yx 00:04:00 2 00:05:00 3 00:07:00 1 00:08:00 1 

We built a script like this, but with a large dataset, it takes time. Is there a more elegant solution? And I wonder why df<-rbind(bbb,df) returns inverted df?

 aaa<-data.frame(y,x) df<-NULL for (i in 1:length(aaa$x)){ bbb<-ifelse((aaa$x[i]*aaa$x[i+1])!=0, aaa$x[i], ifelse((aaa$x[i]*aaa$x[i-1])!=0, aaa$x[i], NA)) df<-rbind(bbb,df) } df<-data.frame(rev(df)) aaa$x<-df$rev.df. bbb<-na.omit(aaa) bbb 

I am new to R, so please, as much as possible :) Thanks!

+4
source share
1 answer
 aaa <- data.frame(y,x) rles <- rle(aaa$x == 0) bbb <- aaa[rep(rles$values == FALSE & rles$lengths >= 2, rles$lengths),] 

which gives

 > bbb yx 4 00:04:00 2 5 00:05:00 3 7 00:07:00 1 8 00:08:00 1 

In the question you had: df<-rbind(bbb,df) returns df in reverse order, because you add a new line ( bbb ) before the rest (existing) lines; invert the order of the arguments and you will not need to drop the df .

Now, to break up the answer, because it includes many parts. First, to paraphrase your criteria, you want to stretch aaa that don't have 0, at least for 2 lines. So, the first criteria is to find 0

 > aaa$x == 0 [1] TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE 

Then you want to find out the length of each of these sections; rle does it.

 > rle(aaa$x == 0) Run Length Encoding lengths: int [1:8] 1 1 1 2 1 2 1 1 values : logi [1:8] TRUE FALSE TRUE FALSE TRUE FALSE ... 

This means that it was 1 TRUE , then 1 FALSE , then 1 TRUE , then 2 FALSE s, etc. This result is assigned to rles . The required parts are FALSE (not 0), and the length of this run is 2 or more.

 > rles$values == FALSE & rles$lengths >= 2 [1] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE 

This needs to be expanded to the length aaa , and rep will do this using rles$lengths to replicate the corresponding records.

 > rep(rles$values == FALSE & rles$lengths >= 2, rles$lengths) [1] FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE 

This gives a logical vector corresponding to index aaa

 > aaa[rep(rles$values == FALSE & rles$lengths >= 2, rles$lengths),] yx 4 00:04:00 2 5 00:05:00 3 7 00:07:00 1 8 00:08:00 1 
+2
source

Source: https://habr.com/ru/post/1438518/


All Articles