Since the question originally had the bioinformatics tag, I mentioned the Bioconductor IRanges package (and this is the companion for ranges in GenomicRanges genomes)
> library(IRanges) > xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1) > sl = slice(Rle(xx), 1) > sl Views on a 14-length Rle subject views: start end width [1] 1 6 6 [1 1 1 1 1 1] [2] 11 14 4 [1 1 1 1]
which could be forced into a matrix, but this would often not be convenient for any next step
> matrix(c(start(sl), end(sl)), ncol=2) [,1] [,2] [1,] 1 6 [2,] 11 14
Other operations may begin with Rle , for example,
> xx = c(2,2,2,3,3,3,0,0,0,0,4,4,1,1) > r = Rle(xx) > m = cbind(start(r), end(r))[runValue(r) != 0,,drop=FALSE] > m [,1] [,2] [1,] 1 3 [2,] 4 6 [3,] 11 12 [4,] 13 14
See ?Rle man page for full Rle class Rle ; go from such a matrix, as indicated above, to the new Rle, as indicated in the comment below, you can create a new Rle of the appropriate length, and then assign the subset using IRanges as an index
> r = Rle(0L, max(m)) > r[IRanges(m[,1], m[,2])] = 1L > r integer-Rle of length 14 with 3 runs Lengths: 6 4 4 Values : 1 0 1
One could expand this to a full vector
> as(r, "integer") [1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1
but often itβs better to continue the analysis on Rle. The class is very flexible, so one of the ways to go from xx to the integer vector 1 and 0 is
> as(Rle(xx) > 0, "integer") [1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1
Again, it often makes sense to stay in the Rle space. And Arun's answer to your separate question is probably best.
Performance (speed) is important, although in this case I believe that the Rle class provides more flexibility that will affect poor performance, and getting into the matrix is ββan unlikely endpoint for a typical analysis. Nonetheles Infrastructure IRanges Effective
eddi <- function(xx) matrix(which(diff(c(0,xx,0)) != 0) - c(0,1), ncol = 2, byrow = TRUE) iranges = function(xx) { sl = slice(Rle(xx), 1) matrix(c(start(sl), end(sl)), ncol=2) } iranges.1 = function(xx) { r = Rle(xx) cbind(start(r), end(r))[runValue(r) != 0, , drop=FALSE] }
with
> xx = sample(c(0, 1), 1e5, TRUE) > microbenchmark(eddi(xx), iranges(xx), iranges.1(xx), times=10) Unit: milliseconds expr min lq median uq max neval eddi(xx) 45.88009 46.69360 47.67374 226.15084 234.8138 10 iranges(xx) 112.09530 114.36889 229.90911 292.84153 294.7348 10 iranges.1(xx) 31.64954 31.72658 33.26242 35.52092 226.7817 10