R: calculate the number of occurrences of a particular event at a specified time

Question

R: calculate the number of occurrences of a particular event at a specified time

my simplified data is as follows:

set.seed(1453); x = sample(0:1, 10, TRUE) date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20', '2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31') df = data.frame(x, date = as.Date(date)) df x date 1 2016-01-01 0 2016-01-05 1 2016-01-07 0 2016-01-12 0 2016-01-16 1 2016-01-20 1 2016-01-20 0 2016-01-25 0 2016-01-26 1 2016-01-31

I would like to calculate the number of occurrences for x == 1 over a certain period of time, for example. 14 and 30 days from the current date (but excluding the current record if it is x == 1 desired result will look like this:

 solution x date x_plus14 x_plus30 1 2016-01-01 1 3 0 2016-01-05 1 4 1 2016-01-07 2 3 0 2016-01-12 2 3 0 2016-01-16 2 3 1 2016-01-20 2 2 1 2016-01-20 1 1 0 2016-01-25 1 1 0 2016-01-26 1 1 1 2016-01-31 0 0

Ideally, I would like it to be in dplyr , but this is not necessary. Any ideas how to achieve this? Many thanks for your help!

+4

date r aggregate dplyr

Kasia Kulma Jan 11 '17 at 14:27

source share

5 answers

Previously, I did not include the current date and therefore the numbers did not match.

 library(data.table) setDT(df)[, `:=`(x14 = sum(df$x[between(df$date, date, date + 14, incbounds = FALSE)]), x30 = sum(df$x[between(df$date, date, date + 30, incbounds = FALSE)])), by = date] # x date x14 x30 # 1: 1 2016-01-01 1 3 # 2: 0 2016-01-05 1 4 # 3: 1 2016-01-07 2 3 # 4: 0 2016-01-12 2 3 # 5: 0 2016-01-16 2 3 # 6: 1 2016-01-20 1 1 # 7: 1 2016-01-20 1 1 # 8: 0 2016-01-25 1 1 # 9: 0 2016-01-26 1 1 # 10: 1 2016-01-31 0 0

Or a general solution that will work for any desired range

 vec <- c(14, 30) # Specify desired ranges setDT(df)[, paste0("x", vec) := lapply(vec, function(i) sum(df$x[between(df$date, date, date + i, incbounds = FALSE)])), by = date]

+4

joel.wilson Jan 11 '17 at 14:54

source share

Here is my hit on it with some help dplyr + purrr . I got slightly different calculations due to <= and >= in the helper function x_next() , if you configure them correctly, I think you can get what you want. NTN.

 library("tidyverse") library("lubridate") set.seed(1453) x = sample(0:1, 10, TRUE) dates = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20', '2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31') df = data_frame(x = x, dates = lubridate::as_date(dates)) # helper function to calculate the sum of xs in the next days_in_future x_next <- function(d, days_in_future) { df %>% # subset on days of interest filter(dates > d & dates <= d + days(days_in_future)) %>% # sum up xs summarise(sum = sum(x)) %>% # have to unlist them so that the (following) call to mutate works unlist(use.names=F) } # mutate your df df %>% mutate(xplus14 = map(dates, x_next, 14), xplus30 = map(dates, x_next, 30))

+2

davidski Jan 11 '17 at 15:11

source share

A dplyr solution of dplyr and purrr :

 library(tidyverse) sample %>% mutate(x_plus14 = map(date, ~sum(x == 1 & between(date, . + 1, . + 14))), x_plus30 = map(date, ~sum(x == 1 & between(date, . + 1, . + 30))))

  x date x_plus14 x_plus30 1 1 2016-01-01 1 4 2 0 2016-01-05 1 4 3 1 2016-01-07 2 3 4 0 2016-01-12 2 3 5 0 2016-01-16 2 3 6 1 2016-01-20 1 1 7 1 2016-01-20 1 1 8 0 2016-01-25 1 1 9 0 2016-01-26 1 1 10 1 2016-01-31 0 0

+2

Axeman Jan 11 '17 at 15:26

source share

As already mentioned, it is strange that you do not count the day, and you should avoid naming names of functions (sample). However, the code below reproduces the desired result:

 set.seed(1453); x = sample(0:1, 10, TRUE) date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20', '2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31') sample = data.frame(x = x, date = as.Date(sample$date)) getOccurences <- function(one_row, sample_data, date_range){ one_date <- as.Date(one_row[2]) sum(sample$x[sample_data$date > one_date & sample_data$date < one_date + date_range]) } sample$x_plus14 <- apply(sample,1,getOccurences, sample, 14) sample$x_plus30 <- apply(sample,1,getOccurences, sample, 30) sample x date x_plus14 x_plus30 1 1 2016-01-01 1 3 2 0 2016-01-05 1 4 3 1 2016-01-07 2 3 4 0 2016-01-12 2 3 5 0 2016-01-16 2 3 6 1 2016-01-20 1 1 7 1 2016-01-20 1 1 8 0 2016-01-25 1 1 9 0 2016-01-26 1 1 10 1 2016-01-31 0 0

+1

Kamil S Jaron Jan 11 '17 at 15:06

source share

alexis_laz · Accepted Answer · 2017-01-11T17:48:00+0000

Adding another findInterval based findInterval :

 cs = cumsum(df$x) # cumulative number of occurences data.frame(df, plus14 = cs[findInterval(df$date + 14, df$date, left.open = TRUE)] - cs, plus30 = cs[findInterval(df$date + 30, df$date, left.open = TRUE)] - cs) # x date plus14 plus30 #1 1 2016-01-01 1 3 #2 0 2016-01-05 1 4 #3 1 2016-01-07 2 3 #4 0 2016-01-12 2 3 #5 0 2016-01-16 2 3 #6 1 2016-01-20 2 2 #7 1 2016-01-20 1 1 #8 0 2016-01-25 1 1 #9 0 2016-01-26 1 1 #10 1 2016-01-31 0 0

R: calculate the number of occurrences of a particular event at a specified time

More articles: