I have a somewhat large data set (784,932 rows / elements, 27,492 unique identifiers). For each Item in each ID , I am trying to create a dummy variable equal to 1 if the difference between dates is less than 60 seconds.
Stylized data and code:
ID <- c(1,1,1,1,1,1,3,3,3,3,3,3)
Item <- c(10,10,10,20,20,20,10,20,10,10,10,20)
Date <- c("19/11/13 18:58:00","19/11/13 18:58:21","19/11/13 20:58:00","19/11/13 18:58:00","19/11/13 18:58:00","19/11/13 18:58:00","19/11/13 18:58:00","19/11/13 18:58:00","19/11/13 18:58:00","19/11/13 18:58:00","19/11/13 18:58:00","19/11/13 19:58:00")
df <- data.frame(ID, Item, Date)
df <- df[order(ID, Date), ]
df[, "Date"] = lapply(df["Date"],function(x){strptime(x, "%d/%m/%y %H:%M:%S")})
fnDummy <- function(date) { ifelse(c(999, diff(date))<60, 1, 0) }
library(plyr)
ddply(df, .(ID, Item), transform, Dummy=fnDummy(Date) )
Output:
ID Item Date Dummy
1 1 10 2013-11-19 18:58:00 0
2 1 10 2013-11-19 18:58:21 1
3 1 10 2013-11-19 20:58:00 0
4 1 20 2013-11-19 18:58:00 0
5 1 20 2013-11-19 18:58:00 1
6 1 20 2013-11-19 18:58:00 1
7 3 10 2013-11-19 18:58:00 0
8 3 10 2013-11-19 18:58:00 1
9 3 10 2013-11-19 18:58:00 1
10 3 10 2013-11-19 18:58:00 1
11 3 20 2013-11-19 18:58:00 0
12 3 20 2013-11-19 19:58:00 1
From the output you see that the first and second lines have a common identifier and an element, and the date difference is only 21 seconds, so the dummy is 1. The second and third lines also have a common identifier and an element, but here the date difference is much more than 60 seconds , so the dummy is 0.
, . 1000 40 (. system.time). . 180 ( ).
user system elapsed
36.485 3.328 39.800
? data.table, ?