Divide data table by hourly totals in R

Question

Divide data table by hourly totals in R

I have the following data, where each row corresponds to a member of the household who makes a specific trip. Since we are talking about household members, these rows may have overlapping times, as shown by line 1 and line 2. The duration of the trip is indicated in minutes. IDX is simply just an index to make conversion back available.

IDX  | ID   | Trip |   StartDateTime    | Duration (in minutes)
1    |  1   |  1   |  2015-01-21 13:00  | 100
2    |  1   |  1   |  2015-01-21 13:00  | 184
3    |  1   |  1   |  2015-01-21 10:00  | 91
4    |  1   |  2   |  2015-01-22 13:00  | 30
5    |  2   |  2   |  2015-01-30 23:00  | 100

Now I would like to divide this data into each identifier, trip, day into hourly data as follows:

IDX |  ID   | Trip |   StartDateTime      | Duration (in minutes)
1   |  1    |  1   |  2015-01-21 13:00    | 60
1   |  1    |  1   |  2015-01-21 14:00    | 40

Note that the total duration of this group is still 100, similar to the first line. Secondly, IDX is taken from the first line. However, for the 4th row we have no more than 60 minutes, so that it will not be divided. Resulting:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
4    |  1   |  2   |  2015-01-22 13:00    | 25

, ! :

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
5    |  2   |  2   |  2015-01-30 23:00    | 60
5    |  2   |  2   |  2015-01-31 0:00     | 40

?

:

library(data.table)

data.table(IDX = c(1:5),
           ID  = c(1,1,1,2,2),
           Trip = c(1,1,1,1,2),
           StartDateTime = strptime(c("2015-01-21 13:00","2015-01-21 13:00","2015-01-21 10:00","2015-01-22 13:00","2015-01-30 23:00"), format="%Y-%m-%d %H:%M"),
           Duration = c(100,184,91,30,100)
)

13:12, , .

, , :

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
6    |  3   |  1   |  2015-01-30 23:14    | 67

:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
6    |  3   |  1   |  2015-01-30 23:00    | 46
6    |  3   |  1   |  2015-01-31 0:00     | 11

, , , eddi.

+2

r data.table

J. K. Lifan 08 . '15 18:18

2

dt[, .(IDX, ID, Trip,
       StartDateTime = StartDateTime + 60*seq(0, Duration, 60),
       Duration = diff(c(seq(0, Duration, 60), Duration)))
   , by = 1:nrow(dt)]
#    nrow IDX ID Trip       StartDateTime Duration
# 1:    1   1  1    1 2015-01-21 13:00:00       60
# 2:    1   1  1    1 2015-01-21 14:00:00       40
# 3:    2   2  1    1 2015-01-21 13:00:00       60
# 4:    2   2  1    1 2015-01-21 14:00:00       60
# 5:    2   2  1    1 2015-01-21 15:00:00       60
# 6:    2   2  1    1 2015-01-21 16:00:00        4
# 7:    3   3  1    1 2015-01-21 10:00:00       60
# 8:    3   3  1    1 2015-01-21 11:00:00       31
# 9:    4   4  2    1 2015-01-22 13:00:00       30
#10:    5   5  2    2 2015-01-30 23:00:00       60
#11:    5   5  2    2 2015-01-31 00:00:00       40

:

dt[5, StartDateTime := StartDateTime + 14*60]

library(lubridate)

dt[, {dur = diff(c(minute(StartDateTime),
                   tail(seq(0, Duration, 60), -1),
                   Duration + minute(StartDateTime)))
      list(StartDateTime = floor_date(StartDateTime, "hour") + (seq_along(dur)-1)*3600,
           Duration = dur)}
   , by = .(IDX, ID, Trip)]
#    IDX ID Trip       StartDateTime Duration
# 1:   1  1    1 2015-01-21 13:00:00       60
# 2:   1  1    1 2015-01-21 14:00:00       40
# 3:   2  1    1 2015-01-21 13:00:00       60
# 4:   2  1    1 2015-01-21 14:00:00       60
# 5:   2  1    1 2015-01-21 15:00:00       60
# 6:   2  1    1 2015-01-21 16:00:00        4
# 7:   3  1    1 2015-01-21 10:00:00       60
# 8:   3  1    1 2015-01-21 11:00:00       31
# 9:   4  2    1 2015-01-22 13:00:00       30
#10:   5  2    2 2015-01-30 23:00:00       46
#11:   5  2    2 2015-01-31 00:00:00       54

+2

eddi 08 . '15 18:48

Frank · Accepted Answer · 2015-10-08T19:00:42+0000

@eddi, base difftime lubridate:

# modifying the example:
DT[1, StartDateTime := as.POSIXct("2015-01-21 13:12")]

DT[,{
    t0  = StartDateTime
    t1  = StartDateTime + Duration*60

    h0  = trunc(t0, units="hour") 
    h1  = trunc(t1, units="hour") 
    h   = seq(h0, h1, by="hour")
    nh  = length(h)     

    dur = as.difftime(rep("1",nh), format="%H", units="mins")
    if (h0 <  t0) dur[1 ] = difftime(h0 + as.difftime("1", format="%H", units="mins"), t0)
    if (h1 <  t1) dur[nh] = difftime(t1, h1)
    if (h0 == h1) dur     = difftime(t1, t0)

    list(h = h, dur = dur)
}, by=.(IDX, ID, Trip)]

    IDX ID Trip                   h     dur
 1:   1  1    1 2015-01-21 13:00:00 48 mins
 2:   1  1    1 2015-01-21 14:00:00 52 mins
 3:   2  1    1 2015-01-21 13:00:00 60 mins
 4:   2  1    1 2015-01-21 14:00:00 60 mins
 5:   2  1    1 2015-01-21 15:00:00 60 mins
 6:   2  1    1 2015-01-21 16:00:00  4 mins
 7:   3  1    1 2015-01-21 10:00:00 60 mins
 8:   3  1    1 2015-01-21 11:00:00 31 mins
 9:   4  2    1 2015-01-22 13:00:00 30 mins
10:   5  2    2 2015-01-30 23:00:00 60 mins
11:   5  2    2 2015-01-31 00:00:00 40 mins

Divide data table by hourly totals in R

More articles: