How can I aggregate time closure events in R

I need to have a data frame with events and display the start, end and run counts, where runs are found where events are less than a certain period of time.

Data.frame rows are already sorted by time

eg.

library(lubridate)

ts <- c("2016-10-28 19:21:19",
        "2016-10-28 19:21:20",
        "2016-10-28 19:21:21",
        "2016-10-28 19:21:21",
        "2016-10-28 19:23:23",
        "2016-10-28 19:23:24",
        "2016-10-28 19:23:24",
        "2016-10-28 19:23:25",
        "2016-10-30 03:59:09",
        "2016-10-30 08:54:31",
        "2016-10-30 08:54:35"
)

df  <- data.frame(time=ymd_hms(ts))

What I would like to receive is a data frame where this interval is 60 from the previous event

start                end                  count
2016-10-28 19:21:19  2016-10-28 19:21:21  4 
2016-10-28 19:23:23  2016-10-28 19:23:25  4
2016-10-30 03:59:09  2016-10-30 03:59:09  1
2016-10-30 08:54:31  2016-10-30 08:54:35  2

Actual sequences will be very long, so the solution should work well with large (~ 100k) strings

I looked at lag, diffand other functions, but can not see the simple and effective way to do it.

+4
source share
1 answer

Here is the code using dplyr.

-, time , . a timeChange difftime, , , isBigChange ( 60 ). TRUE cumsum ( TRUE 1). group_by, group, .

df %>%
  arrange(time) %>%
  mutate(timeChange = difftime(time, lag(time, default = time[1])
                               , units = "secs")
         , isBigChange = timeChange > 60
         , group = cumsum(isBigChange)) %>%
  group_by(group) %>%
  summarise(
    start = min(time)
    , end = max(time)
    , count = n()
  )

  group               start                 end count
  <int>              <dttm>              <dttm> <int>
1     0 2016-10-28 19:21:19 2016-10-28 19:21:21     4
2     1 2016-10-28 19:23:23 2016-10-28 19:23:25     4
3     2 2016-10-30 03:59:09 2016-10-30 03:59:09     1
4     3 2016-10-30 08:54:31 2016-10-30 08:54:35     2
+2

Source: https://habr.com/ru/post/1659418/


All Articles