R how to determine the distance of the last occurrence

I want to calculate how much time has passed since something happened.

Given the following, you can see that the light is turned on in some cases, but not all the time. I want to normalize the data to transfer it to the neural network.

library(data.table) d<-data.table( date = c("6/1/2013", "6/2/2013","6/3/2013","6/4/2013"), light = c(TRUE,FALSE,FALSE,TRUE) ) d date light 1: 6/1/2013 TRUE 2: 6/2/2013 FALSE 3: 6/3/2013 FALSE 4: 6/4/2013 TRUE 

what I would like to calculate is another column that shows the "distance" to the last occurrence.

therefore, for the data above: the first line, since it should be zero, the second line should be 1 third line, should be 2 fourth line, should be zero

+6
source share
3 answers

This should do it:

 d[, distance := 1:.N - 1, by = cumsum(light)] 

or that:

 d[, distance := .I - .I[1], by = cumsum(light)] 

And if you want to actually count the number of days, not the distance to the line, you can use:

 d[, distance := as.numeric(as.POSIXct(date, format = "%m/%d/%Y") - as.POSIXct(date[1], format = "%m/%d/%Y"), units = 'days'), by = cumsum(light)] 
+4
source

I would suggest creating a grouping column based on when there is a switch from FALSE to TRUE:

 # create group column d[c(light), group := cumsum(light)] d[is.na(group), group:=0L] d[, group := cumsum(group)] d 

Then just count the group using cumsum and negating light :

 d[, distance := cumsum(!light), by=group] # remove the group column for cleanliness d[, group := NULL] 

Results:

 d date light distance 1: 2013-06-01 TRUE 0 2: 2013-06-02 FALSE 1 3: 2013-06-03 FALSE 2 4: 2013-06-04 TRUE 0 5: 2013-06-05 TRUE 0 6: 2013-06-06 FALSE 1 7: 2013-06-07 FALSE 2 8: 2013-06-08 TRUE 0 

I added a few lines

+5
source

An approach using run length encoding ( rle ) and sequence (which is a wrapper for unlist(lapply(nvec, seq_len))

 d[, distance := sequence(rle(light)$lengths)][(light), distance := 0] 
+2
source

Source: https://habr.com/ru/post/948911/


All Articles