How to count the repeating repeating part of a sequence in R?

Question

How to count the repeating repeating part of a sequence in R?

Is it possible to consider the repeating part of the sequence in R? For instance:

x<- c(1,3.0,3.1,3.2,1,1,2,3.0,3.1,3.2,4,4,5,6,5,3.0,3.1,3.2, 3.1,2,1,4,6,4.0,4,3.0,3.1,3.2,5,3.2,3.0,4)

Is it possible to calculate the time during which a subsequence of 3,0,3,1,3,2 occurs? So, in this example, it should be: 4

+4

r count sequence

user2531964 Jun 28 '13 at 13:39

source share

4 answers

Another (general moving window):

 x <- c(1,3.0,3.1,3.2,1,1,2,3.0,3.1,3.2,4,4,5,6,5,3.0,3.1,3.2, 3.1,2,1,4,6,4.0,4,3.0,3.1,3.2,5,3.2,3.0,4) s <- c(3, 3.1, 3.2) sum(apply(embed(x, length(s)), 1, function(y) {all(y == rev(s))})) # [1] 4

See the embed output to see what happens.

As Arun apply indicates, here is pretty slow, and you can use embed along with the Arun matrix trick to achieve this much faster:

 sum(colSums(matrix(embed(x, length(s)), byrow = TRUE, nrow = length(s)) == rev(s)) == length(s))

+3

eddi Jun 28 '13 at 15:08

source share

You can turn it into a string and use gregexpr .

 sum(gregexpr("3 3.1 3.2", paste(x, collapse=" "), fixed=TRUE)[[1]] != -1) [1] 4

+2

Hong ooi Jun 28 '13 at 13:50

source share

The Carl Witthoft seqle seqle may be useful for you here.

The function is as follows:

 seqle <- function(x,incr=1) { if(!is.numeric(x)) x <- as.numeric(x) n <- length(x) y <- x[-1L] != x[-n] + incr i <- c(which(y|is.na(y)),n) list(lengths = diff(c(0L,i)), values = x[head(c(0L,i)+1L,-1L)]) }

For your data, it should look like this:

 temp <- seqle(x, incr=.1) temp # $lengths # [1] 1 3 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 # # $values # [1] 1.0 3.0 1.0 1.0 2.0 3.0 4.0 4.0 5.0 6.0 5.0 3.0 3.1 2.0 1.0 4.0 # [17] 6.0 4.0 4.0 3.0 5.0 3.2 3.0 4.0

Now how do we read this? lengths tells us that our vector had a sequence of 1, then 3, then 1 and 1, and 1 and 3 .... values tells us that the first value of the sequence length 3 was “3.0”, the first value of the next sequence of length 3 was "3.0", etc.

It is easier to see as data.frame .

 data.frame(temp)[temp$lengths > 1, ] # lengths values # 2 3 3 # 6 3 3 # 12 3 3 # 20 3 3

In this example, the lengths of all sequences are the same, and they start with the same value, so we can get your answer simply by looking at the number of lines in the data.frame received above.

+2

A5C1D2H2I1M1N2O1R2T1 Jun 28 '13 at 14:34

source share

Arun · Accepted Answer · 2013-06-28T13:58:03+0000

I would do something like this:

 pattern <- c(3, 3.1, 3.2) len1 <- seq_len(length(x) - length(pattern) + 1) len2 <- seq_len(length(pattern))-1 sum(colSums(matrix(x[outer(len1, len2, '+')], ncol=length(len1), byrow=TRUE) == pattern) == length(len2))

PS: changing sum to which , you get the start of each instance.

How to count the repeating repeating part of a sequence in R?

More articles: