The frequencies of all subsequences of size 3 in a given 0-1 seconds?

Question

The frequencies of all subsequences of size 3 in a given 0-1 seconds?

Data

s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)

I can read 1s and 0s with a table or ftable

 ftable(s,row.vars =1:1)

and the totals 11s, 01s, 10s, 00s occurred in s with

 table(s[-length(s)],s[-1]).

What would be the smart way to count occurrences of 111s, 011s, ..., 100s, 000s? Ideally, I need a sample table x, for example

  0 1 11 xx 01 xx 10 xx 00 xx

Is there a general way to calculate common occurrences for all possible subsequences of length k = 1,2,3,4, ... in the data? Thank you

+4

r count sequence

andrekos Feb 17 '10 at 7:22

source share

2 answers

One approach is to create a subsequence data frame, and then use the table function:

 s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0) n<-length(s) k<-3 subseqs<-t(sapply(1:(n-k+1),function(i){s[i:(i+k-1)]})) colnames(subseqs)<-paste('Y',1:k,sep="") subseqs<-data.frame(subseqs) table(subseqs)

It creates

 , , Y3 = 0 Y2 Y1 0 1 0 4 1 1 3 1 , , Y3 = 1 Y2 Y1 0 1 0 2 1 1 0 1

Use ftable instead of the table or at the output of the table to display similar to that in your question:

 ftable(subseqs) Y3 0 1 Y1 Y2 0 0 4 2 1 1 1 1 0 3 0 1 1 1

+1

Jyotirmoy bhattacharya Feb 18 '10 at 9:13

source share

Sharpie · Accepted Answer · 2010-02-17T20:38:50+0000

Well, it looks like you will first need to generate n-tuples from your vector. The following function should do the following:

 makeTuples <- function( x, n ){ # Very inefficient way to loop... but what the heck tuples <- list() for( i in 1:n ){ tuples[[i]] <- x[i:(length(x)-n+i)] } return(tuples) }

You can then pass the results of makeTuples() to table() using do.call() :

 do.call( table, makeTuples(s,3) ) , , = 0 0 1 0 4 1 1 3 1 , , = 1 0 1 0 2 1 1 0 1

This works because the makeTuples() function returns tuples as a list of lists. The result is not as good as you would like, but you can write a function to reformat it, say:

 , , = 0 0 1 0 4 1 1 3 1

To:

  0 1 00 4 1 01 3 1

This would require loops over the outer n-2 dimensions of the n-dimensional array returned by table , creating row names and concatenating them together.

Update

So, I was just sitting in the Stochastic process class when I figured out a more or less straightforward way to create the result you want, without trying to unwind the output of table() . First you need a function that generates all possible permutations of n samples from your population. Permutation generation can be done using expand.grid() , but this requires a small amount of sugar:

 permute <- function( population, n ){ permutations <- do.call( expand.grid, rep( list(population), n ) ) permutations <- apply( permutations, 1, paste, collapse = '' ) return( permutations ) }

The main idea is to iterate over the list of permutations and count the number of tuples that correspond to the given permutation. Since you want the results to be divided into a table, we must select a permutation of n-1 elements from the set and let the last position form the columns of the table. Here, a function that accepts a permutation of size n-1, a list of tuples, and a collection of tuples was taken from and produces a named match vector:

 countFrequency <- function(permutation,tuples,population){ permutations <- paste( permutation, population, sep = '' ) # Inner lapply applies the equality operator `==` to each # permutation and returns a list of TRUE/FALSE vectors. # Outer lapply sums the number of TRUE values in each vector. frequencies <- lapply(lapply(permutations,`==`,tuples),sum) names( frequencies ) <- as.character( population ) return( unlist(frequencies) ) }

Finally, all three functions can be combined into a large function that takes a vector, splits it into n-tuples, and returns a frequency table. The final aggregation operation is performed using ldply() from the Hadley Wickham plyr , since it handles the storage of information well, for example, which permutation corresponds to the line of the line that corresponds to:

 permutationFrequency <- function( vector, n, population = unique( vector ) ){ # Split the vector into tuples. tuples <- makeTuples( vector, n ) # Coerce and compact the tuples to a vector of strings. tuples <- do.call(cbind,tuples) tuples <- apply( tuples, 1, paste, collapse = '' ) # Generate permutations of n-1 elements from the population. # Turn into a named list for ldply() to work it magic. permutations <- permute( population, n-1 ) names( permutations ) <- permutations frequencies <- ldply( permutations, countFrequency, tuples = tuples, population = population ) return( frequencies ) }

And here you are:

 require( plyr ) permutationFrequency( s, 2 ) .id 1 0 1 1 2 3 2 0 2 7 permutationFrequency( s, 3 ) .id 1 0 1 11 1 1 2 01 1 1 3 10 0 3 4 00 2 4 permutationFrequency( s, 4 ) .id 1 0 1 111 0 1 2 011 1 0 3 101 0 0 4 001 1 1 5 110 0 1 6 010 0 1 7 100 0 2 8 000 2 2 permutationFrequency( sample( -1:1, 10, replace = T ), 2 ) .id 1 -1 0 1 1 1 2 0 2 -1 0 1 2 3 0 1 0 2

Apologizing to my teacher of stochastic processes, but the problems of functional programming in R were more interesting than Gambler Ruin today ...

The frequencies of all subsequences of size 3 in a given 0-1 seconds?

More articles: