Choosing only 0s and the first 1 from a sequence of many 0s and several 1s in R?

I have a sequence of 0s and 1s as follows:

xx <- c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1) 

And I want to select 0s and the first 1s.

The results should be:

 ans <- c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1) 

What is the fastest way? in R

+6
source share
5 answers

Use rle() to retrieve the lengths and values ​​of the run, do a little operation, and then put the encoded length vector “back together” using inverse.rle() .

 rr <- rle(xx) rr$lengths[rr$values==1] <- 1 inverse.rle(rr) # [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 
+16
source

Here is one way:

 idx <- which(xx == 1) pos <- which(diff(c(xx[1], idx)) == 1) xx[-idx[pos]] # following Frank suggestion # [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 
+8
source

Without rle:

 xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)] #[1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 

Since the OP is the speed mentioned, here's the benchmark:

 josh = function(xx) { rr <- rle(xx) rr$lengths[rr$values==1] <- 1 inverse.rle(rr) } arun = function(xx) { idx <- which(xx == 1) pos <- which(diff(c(xx[1], idx)) == 1) xx[setdiff(seq_along(xx), idx[pos])] } eddi = function(xx) { xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)] } simon = function(xx) { # The body of the function is supplied in @SimonO101 answer first1(xx) } set.seed(1) N = 1e6 xx = sample(c(0,1), N, T) library(microbenchmark) bm <- microbenchmark(josh(xx), arun(xx), eddi(xx), simon(xx) , times = 25) print( bm , digits = 2 , order = "median" ) #Unit: milliseconds # expr min lq median uq max neval # simon(xx) 20 21 23 26 72 25 # eddi(xx) 97 102 104 118 149 25 # arun(xx) 205 245 253 258 332 25 # josh(xx) 228 268 275 287 365 25 
+7
source

Here is a quick Rcpp solution. It should be fast (but I don't know how it will stack up against others here) ...

 Rcpp::cppFunction( 'std::vector<int> first1( IntegerVector x ){ std::vector<int> out; for( IntegerVector::iterator it = x.begin(); it != x.end(); ++it ){ if( *it == 1 && *(it-1) != 1 || *it == 0 ) out.push_back(*it); } return out; }') first1(xx) # [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 
+3
source

Even I am a staunch supporter of rle , since there is an alternative method here on Friday. I did it for fun, so YMMV.

 yy<-paste(xx,collapse='') zz<-gsub('[1]{1,}','1',yy) #I probably screwed up the regex here aa<- as.numeric(strsplit(zz,'')[[1]]) 
+2
source

Source: https://habr.com/ru/post/954271/


All Articles