Create a pivot table of custom event data

Edit 2: I realized that I can use dcast()to accomplish what I want to do. However, I do not want to read all the events in the event data, only those that occurred before the date specified in another data set. I can't figure out how to use the subset argument in dcast(). So far I have tried:

dcast(dt.events, Email ~ EventType, fun.aggregate = length, subset = as.Date(Date) <= 
as.Date(dt.users$CreatedDate[dt.users$Email = dt.events$Email]))

However, this does not work. I could add a column CreatedDatefrom dt.usersin dt.events. And then a subset using:

dcast(dt.events, Email ~ EventType, fun.aggregate = length, subset = as.Date(Date) <=
as.Date(CreatedDate)

I was wondering if it is possible to do this without adding an extra column?

Edit: Just calculated that it would probably take about 37 hours to complete the way I am doing it now, so if anyone has any hints to make it faster. Please let me know:)

I'm new to R, I figured out a way to do what I want. But it is extremely inefficient and requires several hours.

I have the following:

Event Data:

UserID    Email         EventType    Date

User1     User1@*.com   Type2        2016-01-02
User1     User1@*.com   Type6        2016-01-02
User1     User1@*.com   Type1        2016-01-02
User1     User1@*.com   Type3        2016-01-02
User2     User2@*.com   Type1        2016-01-02
User2     User2@*.com   Type1        2016-01-02
User2     User2@*.com   Type2        2016-01-02
User3     User3@*.com   Type1        2016-01-02
User3     User3@*.com   Type3        2016-01-02
User1     User1@*.com   Type2        2016-01-04
User1     User1@*.com   Type2        2016-01-04
User2     User2@*.com   Type5        2016-01-04
User3     User3@*.com   Type1        2016-01-04
User3     User3@*.com   Type4        2016-01-04

Each time the user does something, the event is recorded with an event type with a time stamp.

List of users from different databases:

UserID    Email         CreatedDate

DxUs1     User1@*.com   2016-01-01
DxUs2     User2@*.com   2016-01-03
DxUs3     User3@*.com   2016-01-03

I want to get the following:

A summary list that counts the amount of each type of event in the event data for each user in the user list. However, events should only be considered if the "Created date" in the user list is before or equal to "Date" in the event data.

, :

Email         Type1    Type2    Type3    Type4     Type5     Type6
User1@*.com   1        3        1        0         0         1
User2@*.com   0        0        1        0         1         0
User3@*.com   1        0        0        1         0         0

, dt.master, . :

Email         Type1    Type2    Type3    Type4     Type5     Type6
User1@*.com   0        0        0        0         0         0
User2@*.com   0        0        0        0         0         0
User3@*.com   0        0        0        0         0         0

, while :

# The data sets
dt.events # event data
dt.users # user list
dt.master # blank master table

# Loop that fills master table
counter_limit = group_size(dt.master)
index = 1

while (index <= counter_limit) {

    # Get events of user at current index
    dt.events.temp = filter(dt.events, dt.events$Email %in% dt.users$Email[index], 
                     as.Date(dt.events$Date) <= as.Date(dt.users$CreatedDate[index]))

    # Count all the different events
    dt.event.counter = as.data.table(t(as.data.table(table(dt.events.temp$EventType))))

    # Clean the counter by 1: Rename columns to event names, 2: Remove event names row
    names(dt.event.counter) = as.character(unlist(dt.event.counter[1,]))
    dt.event.counter = dt.event.counter[-1]

    # Fill the current index in on the blank master table
    set(dt.master, index, names(dt.event.counter), dt.event.counter)

    index = index + 1
}

... 9+ , 250k + , 150 . HOURS . 500 , :

user    system    elapsed
179.33  62.92     242.60

, -. - , , . R , / Googling , . , . , - -, /?

!

: , , , 37 , , , , - - , . , :)

TL, DR: / ( ). ?

+4
2

, data.table, fun.aggregate dcast:

dcast(dat, Email ~ EventType, fun.aggregate = length)

:

         Email Type1 Type2 Type3 Type4 Type5 Type6
1: User1@*.com     1     2     1     0     0     1
2: User2@*.com     4     1     0     0     1     0
3: User3@*.com     0     1     1     1     0     0

: , equic- dcast -:

dcast(dt.events[dt.users, on = .(Email, Date >= CreatedDate)],
      Email ~ EventType, fun.aggregate = length)

:

         Email Type1 Type2 Type3 Type4 Type5 Type6
1: User1@*.com     1     2     1     0     0     1
2: User2@*.com     1     0     0     0     1     0
3: User3@*.com     0     1     0     1     0     0
+3

library(dpylr)
library(tidyr)
your.dataset %>%
  count(Email, EventType) %>%
  spread(EventType, n)
+2

Source: https://habr.com/ru/post/1669383/


All Articles