Create a time interval of 15 minutes from data in R?

I have some data that is formatted as follows:

time count 00:00 17 00:01 62 00:02 41 

So, I have from 00:00 to 23: 59 hours and with a counter per minute. I would like to group the data at intervals of 15 minutes so that:

 time count 00:00-00:15 148 00:16-00:30 284 

I tried to do it manually, but it is grueling, so I'm sure there must be a function or sth to make it easy, but I still haven't figured out how to do it.

I would really appreciate help!

Thank you very much!

+5
source share
2 answers

For data that is in POSIXct format, you can use the cut function to create 15-minute groupings, and then group by these groups. The code below shows how to do this in base R and with dplyr and data.table .

First create some fake data:

 set.seed(4984) dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60), count=sample(1:50, 100, replace=TRUE)) 

Base R

cut data into 15 minute groups:

 dat$by15 = cut(dat$time, breaks="15 min") 
  time count by15 1 2016-05-01 00:00:00 22 2016-05-01 00:00:00 2 2016-05-01 00:01:00 11 2016-05-01 00:00:00 3 2016-05-01 00:02:00 31 2016-05-01 00:00:00 ... 98 2016-05-01 01:37:00 20 2016-05-01 01:30:00 99 2016-05-01 01:38:00 29 2016-05-01 01:30:00 100 2016-05-01 01:39:00 37 2016-05-01 01:30:00 

Now aggregate using the new grouping column, using sum as the aggregation function:

 dat.summary = aggregate(count ~ by15, FUN=sum, data=dat) 
  by15 count 1 2016-05-01 00:00:00 312 2 2016-05-01 00:15:00 395 3 2016-05-01 00:30:00 341 4 2016-05-01 00:45:00 318 5 2016-05-01 01:00:00 349 6 2016-05-01 01:15:00 397 7 2016-05-01 01:30:00 341 

dplyr

 library(dplyr) dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>% summarise(count=sum(count)) 

data.table

 library(data.table) dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")] 

UPDATE: To answer the comment, for this case the endpoint of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1 . In other words, the endpoint of the grouping interval is 15 minutes minus one second from the beginning of the interval. We add 60 * 15 - 1 because POSIXct is expressed in seconds. as.POSIXct(as.character(...)) is that cut returns a coefficient, and it just converts it back to date-time, so that we can do the math on it.

If you want the endpoint to be in the nearest minute to the next interval (and not the nearest second), you could up to as.POSIXct(as.character(dat$by15)) + 60*14 .

If you do not know the interrupt interval, for example, because you have chosen the number of breaks and let R choose the interval, you can find the number of seconds to add by executing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1 .

+7
source

The cropped approach is convenient, but slow with large data frames. The following approach is approximately 1000 times faster than the section approach (tested for 400 thousand records).

  # Function: Truncate (floor) POSIXct to time interval (specified in seconds) # Author: Stephen McDaniel @ PowerTrip Analytics # Date : 2017MAY # Copyright: (C) 2017 by Freakalytics, LLC # License: MIT floor_datetime <- function(date_var, floor_seconds = 60, origin = "1970-01-01") { # defaults to minute rounding if(!is(date_var, "POSIXct")) stop("Please pass in a POSIXct variable") if(is.na(date_var)) return(as.POSIXct(NA)) else { return(as.POSIXct(floor(as.numeric(date_var) / (floor_seconds))*(floor_seconds), origin = origin)) } } 

Output Example:

 test <- data.frame(good = as.POSIXct(Sys.time()), bad1 = as.Date(Sys.time()), bad2 = as.POSIXct(NA)) test$good_15 <- floor_datetime(test$good, 15 * 60) test$bad1_15 <- floor_datetime(test$bad1, 15 * 60) Error in floor_datetime(test$bad, 15 * 60) : Please pass in a POSIXct variable test$bad2_15 <- floor_datetime(test$bad2, 15 * 60) test good bad1 bad2 good_15 bad2_15 1 2017-05-06 13:55:34.48 2017-05-06 <NA> 2007-05-06 13:45:00 <NA> 
0
source

Source: https://habr.com/ru/post/1247828/


All Articles