I have an R data frame containing start and end times for events that look like this:
timestamp endtimestamp
1 2018-03-27 10:00:27 2018-03-27 10:07:27
2 2018-03-27 10:27:28 2018-03-27 10:37:58
3 2018-03-27 10:52:59 2018-03-27 11:01:29
4 2018-03-27 11:17:59 2018-03-27 11:27:00
5 2018-03-27 12:03:29 2018-03-27 12:15:59
6 2018-03-27 12:51:00 2018-03-27 13:01:30
7 2018-03-27 13:18:31 2018-03-27 13:26:01
8 2018-03-27 13:42:56 2018-03-27 13:50:56
9 2018-03-27 14:08:26 2018-03-27 14:21:27
10 2018-03-27 14:36:02 2018-03-27 14:43:58
I want to convert the data to have hourly ranges with the sum of the durations of events that occur only during this hour (for example, an event that starts in one hour and ends in the next will only take into account its parts every hour), resulting in:
starttimestamp endtimestamp duration
1 2018-03-27 10:00:00 2018-03-27 11:00:00 1471 secs
2 2018-03-27 11:00:00 2018-03-27 12:00:00 630 secs
3 2018-03-27 12:00:00 2018-03-27 13:00:00 1290 secs
4 2018-03-27 13:00:00 2018-03-27 14:00:00 1020 secs
5 2018-03-27 14:00:00 2018-03-27 15:00:00 1257 secs
I think I can do this with a loop, although it feels awkward, but any solution that I am trying to use with dplyr / magrittr does not seem to work.
Example: the value of 1471 seconds is calculated as follows:
2018-03-27 10:00:27 to 2018-03-27 10:07:27 = 420 seconds
2018-03-27 10:27:28 until 2018-03-27 10:37:58 = 630 seconds
2018-03-27 10:52:59 2018-03-27 11:00:00
= 421 .
420 + 630 + 421 = 1471
, 11:01:29. 01:29 .
.
:
test <- data.frame(IDX = c(1:10),
timestamp = c(as.POSIXct("2018-03-27T10:00:27Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T10:27:28Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T10:52:59Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T11:17:59Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T12:03:29Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T12:51:00Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T13:18:31Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T13:42:56Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T14:08:26Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T14:36:02Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC")
),
endtimestamp = c(as.POSIXct("2018-03-27T10:07:27Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T10:37:58Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T11:01:29Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T11:27:00Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T12:15:59Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T13:01:30Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T13:26:01Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T13:50:56Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T14:21:27Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC"),
as.POSIXct("2018-03-27T14:43:58Z", format = "%Y-%m-%dT%H:%M:%OS", tz = "UTC")
))