Revealing sequences of repeated numbers in R

I have a long time series where I need to identify and designate sequences of repeating values. Here are some details:

DATETIME WDIR 1 40360.04 22 2 40360.08 23 3 40360.12 126 4 40360.17 126 5 40360.21 126 6 40360.25 126 7 40360.29 25 8 40360.33 26 9 40360.38 132 10 40360.42 132 11 40360.46 132 12 40360.50 30 13 40360.54 132 14 40360.58 35 

So if I need to note when a value is repeated three or more times, I have a sequence of four β€œ126” and a sequence of three β€œ132” that should be labeled.

I am very new to R. I expect that I will use cbind to create a new column in this array with a β€œT” in the corresponding rows, but how to properly fill the column is a mystery. Any pointers please? Thanks a lot.

+6
source share
3 answers

As rle says, you can use rle .

 rle(dat$WDIR) Run Length Encoding lengths: int [1:9] 1 1 4 1 1 3 1 1 1 values : int [1:9] 22 23 126 25 26 132 30 132 35 

rle returns an object with two components, lengths and values. We can use a piece of length to build a new column that identifies which values ​​are repeated more than three times.

 tmp <- rle(dat$WDIR) rep(tmp$lengths >= 3,times = tmp$lengths) [1] FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE 

This will be our new column.

 newCol <- rep(tmp$lengths > 1,times = tmp$lengths) cbind(dat,newCol) DATETIME WDIR newCol 1 40360.04 22 FALSE 2 40360.08 23 FALSE 3 40360.12 126 TRUE 4 40360.17 126 TRUE 5 40360.21 126 TRUE 6 40360.25 126 TRUE 7 40360.29 25 FALSE 8 40360.33 26 FALSE 9 40360.38 132 TRUE 10 40360.42 132 TRUE 11 40360.46 132 TRUE 12 40360.50 30 FALSE 13 40360.54 132 FALSE 14 40360.58 35 FALSE 
+9
source

Use rle to complete the task! This is an amazing feature that calculates the number of consecutive repetitions of numbers in a sequence. Here is an example code where you can use rle to rle intruders in your data. This will return all rows from the data frame that have WDIR that are repeated 3 or more times in a row.

 runs = rle(mydf$WDIR) subset(mydf, WDIR %in% runs$values[runs$lengths >= 3]) 
+4
source

Two options for you.

Assuming data is loaded:

 dat <- read.table(textConnection(" DATETIME WDIR 40360.04 22 40360.08 23 40360.12 126 40360.17 126 40360.21 126 40360.25 126 40360.29 25 40360.33 26 40360.38 132 40360.42 132 40360.46 132 40360.50 30 40360.54 132 40360.58 35"), header=T) 

Option 1 : Sort

 dat <- dat[order(dat$WDIR),] # needed for the 'repeats' to be pasted into the correct rows in next step dat$count <- rep(table(dat$WDIR),table(dat$WDIR)) dat$more4 <- ifelse(dat$count < 4, F, T) dat <- dat[order(dat$DATETIME),] # sort back to original order dat 

Option 2 : Oneliner

 dat$more4 <- ifelse(dat$WDIR %in% names(which(table(dat$WDIR)>3)),T,F) dat 

I thought that the new user can choose a simple step by step, although the rep(table(), table()) )) can be unintuitive initially.

0
source

Source: https://habr.com/ru/post/897796/


All Articles