Using the rolling time interval to count the rows in R and dplyr

Question

Using the rolling time interval to count the rows in R and dplyr

Let's say I have a timestamp timestamp with the corresponding number of tickets sold at that time.

         Timestamp          ticket_count
            (time)              (int)
1  2016-01-01 05:30:00            1
2  2016-01-01 05:32:00            1
3  2016-01-01 05:38:00            1
4  2016-01-01 05:46:00            1
5  2016-01-01 05:47:00            1
6  2016-01-01 06:07:00            1
7  2016-01-01 06:13:00            2
8  2016-01-01 06:21:00            1
9  2016-01-01 06:22:00            1
10 2016-01-01 06:25:00            1

I want to know how to calculate the number of tickets sold over time for all tickets. For example, I want to calculate the number of tickets sold 15 minutes after all tickets. In this case, the first line will have three tickets, the second line will have four tickets, etc.

Ideally, I am looking for a dplyr solution, since I want to do this for several stores using a function group_by(). However, I have a slight problem figuring out how to keep each timestamp fixed for a given string while looking at all the timestamps through the dplyr syntax.

+2

r dplyr

dmartin 24 . '16 16:11

3

Arun · Answer 1 · 2016-06-25T22:11:14+0000

data.table, v1.9.7, non-equi . , data.frame df, Timestamp - POSIXct type:

require(data.table) # v1.9.7+
window = 15L # minutes
(counts = setDT(df)[.(t=Timestamp+window*60L), on=.(Timestamp<t), 
                     .(counts=sum(ticket_count)), by=.EACHI]$counts)
#  [1]  3  4  5  5  5  9 11 11 11 11

# add that as a column to original data.table by reference
df[, counts := counts]

t , df$Timestamp < that_row. by=.EACHI sum(ticket_count) t. .

, .

maloneypatr · Answer 2 · 2016-06-24T18:23:39+0000

, .

# install.packages('dplyr')
library(dplyr)

your_data %>%
  mutate(timestamp = as.POSIXct(timestamp, format = '%m/%d/%Y %H:%M'),
         ticket_count = as.numeric(ticket_count)) %>%
  mutate(window = cut(timestamp, '15 min')) %>%
  group_by(window) %>%
  dplyr::summarise(tickets = sum(ticket_count))

               window tickets
               (fctr)   (dbl)
1 2016-01-01 05:30:00       3
2 2016-01-01 05:45:00       2
3 2016-01-01 06:00:00       3
4 2016-01-01 06:15:00       3

Martin schmelzer · Answer 3 · 2016-06-24T16:33:53+0000

Here is a solution using data.table. Also includes various shops.

Sample data:

library(data.table)
dt <- data.table(Timestamp = as.POSIXct("2016-01-01 05:30:00")+seq(60,120000,by=60),
                 ticket_count = sample(1:9, 2000, T),
                 store = c(rep(c("A","B","C","D"), 500)))

Now apply the following:

ts <- dt$Timestamp
for(x in ts) {
  end <- x+900
  dt[Timestamp <= end & Timestamp >= x ,CS := sum(ticket_count),by=store]
}

It gives you

                    Timestamp ticket_count store CS
       1: 2016-01-01 05:31:00            3     A 13
       2: 2016-01-01 05:32:00            5     B 20
       3: 2016-01-01 05:33:00            3     C 19
       4: 2016-01-01 05:34:00            7     D 12
       5: 2016-01-01 05:35:00            1     A 15
      ---                                          
    1996: 2016-01-02 14:46:00            4     D 10
    1997: 2016-01-02 14:47:00            9     A  9
    1998: 2016-01-02 14:48:00            2     B  2
    1999: 2016-01-02 14:49:00            2     C  2
    2000: 2016-01-02 14:50:00            6     D  6

Using the rolling time interval to count the rows in R and dplyr

More articles: