Edit 2: I realized that I can use dcast()to accomplish what I want to do. However, I do not want to read all the events in the event data, only those that occurred before the date specified in another data set. I can't figure out how to use the subset argument in dcast(). So far I have tried:
dcast(dt.events, Email ~ EventType, fun.aggregate = length, subset = as.Date(Date) <=
as.Date(dt.users$CreatedDate[dt.users$Email = dt.events$Email]))
However, this does not work. I could add a column CreatedDatefrom dt.usersin dt.events. And then a subset using:
dcast(dt.events, Email ~ EventType, fun.aggregate = length, subset = as.Date(Date) <=
as.Date(CreatedDate)
I was wondering if it is possible to do this without adding an extra column?
Edit: Just calculated that it would probably take about 37 hours to complete the way I am doing it now, so if anyone has any hints to make it faster. Please let me know:)
I'm new to R, I figured out a way to do what I want. But it is extremely inefficient and requires several hours.
I have the following:
Event Data:
UserID Email EventType Date
User1 User1@*.com Type2 2016-01-02
User1 User1@*.com Type6 2016-01-02
User1 User1@*.com Type1 2016-01-02
User1 User1@*.com Type3 2016-01-02
User2 User2@*.com Type1 2016-01-02
User2 User2@*.com Type1 2016-01-02
User2 User2@*.com Type2 2016-01-02
User3 User3@*.com Type1 2016-01-02
User3 User3@*.com Type3 2016-01-02
User1 User1@*.com Type2 2016-01-04
User1 User1@*.com Type2 2016-01-04
User2 User2@*.com Type5 2016-01-04
User3 User3@*.com Type1 2016-01-04
User3 User3@*.com Type4 2016-01-04
Each time the user does something, the event is recorded with an event type with a time stamp.
List of users from different databases:
UserID Email CreatedDate
DxUs1 User1@*.com 2016-01-01
DxUs2 User2@*.com 2016-01-03
DxUs3 User3@*.com 2016-01-03
I want to get the following:
A summary list that counts the amount of each type of event in the event data for each user in the user list. However, events should only be considered if the "Created date" in the user list is before or equal to "Date" in the event data.
, :
Email Type1 Type2 Type3 Type4 Type5 Type6
User1@*.com 1 3 1 0 0 1
User2@*.com 0 0 1 0 1 0
User3@*.com 1 0 0 1 0 0
, dt.master, . :
Email Type1 Type2 Type3 Type4 Type5 Type6
User1@*.com 0 0 0 0 0 0
User2@*.com 0 0 0 0 0 0
User3@*.com 0 0 0 0 0 0
, while :
dt.events
dt.users
dt.master
counter_limit = group_size(dt.master)
index = 1
while (index <= counter_limit) {
dt.events.temp = filter(dt.events, dt.events$Email %in% dt.users$Email[index],
as.Date(dt.events$Date) <= as.Date(dt.users$CreatedDate[index]))
dt.event.counter = as.data.table(t(as.data.table(table(dt.events.temp$EventType))))
names(dt.event.counter) = as.character(unlist(dt.event.counter[1,]))
dt.event.counter = dt.event.counter[-1]
set(dt.master, index, names(dt.event.counter), dt.event.counter)
index = index + 1
}
... 9+ , 250k + , 150 . HOURS . 500 , :
user system elapsed
179.33 62.92 242.60
, -. - , , . R , / Googling , . , . , - -, /?
!
: , , , 37 , , , , - - , . , :)
TL, DR: / ( ). ?