R: Aggregation of history by ID by date

I have a large data set that has unique identifiers for individuals, as well as dates, and each person is capable of several meetings.

Below is the code and an example of how this data might look:

strDates <- c("09/09/16", "6/7/16", "5/6/16", "2/3/16", "2/1/16", "11/8/16",      
"6/8/16", "5/8/16","2/3/16","1/1/16")
Date<-as.Date(strDates, "%m/%d/%y")
ID <- c("A", "A", "A", "A","A","B","B","B","B","B")
Event <- c(1,0,1,0,1,0,1,1,1,0)
sample_df <- data.frame(Date,ID,Event)

sample_df

         Date ID Event
1  2016-09-09  A     1
2  2016-06-07  A     0
3  2016-05-06  A     1
4  2016-02-03  A     0
5  2016-02-01  A     1
6  2016-11-08  B     0
7  2016-06-08  B     1
8  2016-05-08  B     1
9  2016-02-03  B     1
10 2016-01-01  B     0

I want to save all the attached information to the meeting, but then aggregate the following historical information using id

  • The number of previous meetings
  • The number of previous events

As an example, consider line 2.

Line 2 is the identifier of A, so I will refer to lines 3-5 (which happened before the Row 2 Encounter). Inside this group of rows, we see that lines 3 and 5 had events.

The number of previous meetings for row 2 = 3

The number of previous events for row 2 = 2

Ideally, I would get the following output:

         Date ID Event PrevEnc PrevEvent
1  2016-09-09  A     1       4         2
2  2016-06-07  A     0       3         2
3  2016-05-06  A     1       2         1
4  2016-02-03  A     0       1         1
5  2016-02-01  A     1       0         0
6  2016-11-08  B     0       4         3
7  2016-06-08  B     1       3         2
8  2016-05-08  B     1       2         1
9  2016-02-03  B     1       1         0
10 2016-01-01  B     0       0         0

dplyr , , , . For-loops If-then, , .

!

0
3

. , ( ). , , 0 cumsum . lag , .

sample_df %>%
  mutate(origIndex = 1:n()) %>%
  group_by(ID) %>%
  arrange(ID, Date) %>%
  mutate(PrevEncounters = 0:(n() -1)
         , PrevEvents = cumsum(lag(Event, default = 0))) %>%
  arrange(origIndex) %>%
  select(-origIndex)

         Date     ID Event PrevEncounters PrevEvents
       <date> <fctr> <dbl>          <int>      <dbl>
1  2016-09-09      A     1              4          2
2  2016-06-07      A     0              3          2
3  2016-05-06      A     1              2          1
4  2016-02-03      A     0              1          1
5  2016-02-01      A     1              0          0
6  2016-11-08      B     0              4          3
7  2016-06-08      B     1              3          2
8  2016-05-08      B     1              2          1
9  2016-02-03      B     1              1          0
10 2016-01-01      B     0              0          0
+1

, data.table, :

library(data.table)

# Convert to data.table and sort
sample_dt <- as.data.table(sample_df)
sample_dt <- sample_dt[order(Date)]

# Count only the previous Events with 1
sample_dt[, prevEvent := ifelse(Event == 1, cumsum(Event) - 1, cumsum(Event)), by = "ID"]

# .I gives the row number, and .SD contains the Subset of the Data for each group
sample_dt[, prevEnc := .SD[,.I - 1], by = "ID"]

print(sample_dt)
          Date ID Event prevEvent prevEnc
 1: 2016-01-01  B     0         0       0
 2: 2016-02-01  A     1         0       0
 3: 2016-02-03  A     0         1       1
 4: 2016-02-03  B     1         0       1
 5: 2016-05-06  A     1         1       2
 6: 2016-05-08  B     1         1       2
 7: 2016-06-07  A     0         2       3
 8: 2016-06-08  B     1         2       3
 9: 2016-09-09  A     1         2       4
10: 2016-11-08  B     0         3       4

package, - .

+2

@Frank @MarkPeterson, , Date . , Date:

library(dplyr)
res <- sample_df %>% group_by(ID) %>% 
                     mutate(PrevEnc=n()-row_number(),
                            PrevEvent=rev(cumsum(lag(rev(Event), default=0))))

row_number() n() ( ID). Date , n()-row_number(). , , Date rev Event cumsum lag . rev, .

:

print(res)
##Source: local data frame [10 x 5]
##Groups: ID [2]
##
##         Date     ID Event PrevEnc PrevEvent
##       <date> <fctr> <dbl>   <int>     <dbl>
##1  2016-09-09      A     1       4         2
##2  2016-06-07      A     0       3         2
##3  2016-05-06      A     1       2         1
##4  2016-02-03      A     0       1         1
##5  2016-02-01      A     1       0         0
##6  2016-11-08      B     0       4         3
##7  2016-06-08      B     1       3         2
##8  2016-05-08      B     1       2         1
##9  2016-02-03      B     1       1         0
##10 2016-01-01      B     0       0         0
0
source

Source: https://habr.com/ru/post/1666178/


All Articles