R: calculate the number of occurrences of a particular event at a specified time

my simplified data is as follows:

set.seed(1453); x = sample(0:1, 10, TRUE) date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20', '2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31') df = data.frame(x, date = as.Date(date)) df x date 1 2016-01-01 0 2016-01-05 1 2016-01-07 0 2016-01-12 0 2016-01-16 1 2016-01-20 1 2016-01-20 0 2016-01-25 0 2016-01-26 1 2016-01-31 

I would like to calculate the number of occurrences for x == 1 over a certain period of time, for example. 14 and 30 days from the current date (but excluding the current record if it is x == 1 desired result will look like this:

 solution x date x_plus14 x_plus30 1 2016-01-01 1 3 0 2016-01-05 1 4 1 2016-01-07 2 3 0 2016-01-12 2 3 0 2016-01-16 2 3 1 2016-01-20 2 2 1 2016-01-20 1 1 0 2016-01-25 1 1 0 2016-01-26 1 1 1 2016-01-31 0 0 

Ideally, I would like it to be in dplyr , but this is not necessary. Any ideas how to achieve this? Many thanks for your help!

+4
source share
5 answers

Adding another findInterval based findInterval :

 cs = cumsum(df$x) # cumulative number of occurences data.frame(df, plus14 = cs[findInterval(df$date + 14, df$date, left.open = TRUE)] - cs, plus30 = cs[findInterval(df$date + 30, df$date, left.open = TRUE)] - cs) # x date plus14 plus30 #1 1 2016-01-01 1 3 #2 0 2016-01-05 1 4 #3 1 2016-01-07 2 3 #4 0 2016-01-12 2 3 #5 0 2016-01-16 2 3 #6 1 2016-01-20 2 2 #7 1 2016-01-20 1 1 #8 0 2016-01-25 1 1 #9 0 2016-01-26 1 1 #10 1 2016-01-31 0 0 
+5
source

Previously, I did not include the current date and therefore the numbers did not match.

 library(data.table) setDT(df)[, `:=`(x14 = sum(df$x[between(df$date, date, date + 14, incbounds = FALSE)]), x30 = sum(df$x[between(df$date, date, date + 30, incbounds = FALSE)])), by = date] # x date x14 x30 # 1: 1 2016-01-01 1 3 # 2: 0 2016-01-05 1 4 # 3: 1 2016-01-07 2 3 # 4: 0 2016-01-12 2 3 # 5: 0 2016-01-16 2 3 # 6: 1 2016-01-20 1 1 # 7: 1 2016-01-20 1 1 # 8: 0 2016-01-25 1 1 # 9: 0 2016-01-26 1 1 # 10: 1 2016-01-31 0 0 

Or a general solution that will work for any desired range

 vec <- c(14, 30) # Specify desired ranges setDT(df)[, paste0("x", vec) := lapply(vec, function(i) sum(df$x[between(df$date, date, date + i, incbounds = FALSE)])), by = date] 
+4
source

Here is my hit on it with some help dplyr + purrr . I got slightly different calculations due to <= and >= in the helper function x_next() , if you configure them correctly, I think you can get what you want. NTN.

 library("tidyverse") library("lubridate") set.seed(1453) x = sample(0:1, 10, TRUE) dates = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20', '2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31') df = data_frame(x = x, dates = lubridate::as_date(dates)) # helper function to calculate the sum of xs in the next days_in_future x_next <- function(d, days_in_future) { df %>% # subset on days of interest filter(dates > d & dates <= d + days(days_in_future)) %>% # sum up xs summarise(sum = sum(x)) %>% # have to unlist them so that the (following) call to mutate works unlist(use.names=F) } # mutate your df df %>% mutate(xplus14 = map(dates, x_next, 14), xplus30 = map(dates, x_next, 30)) 
+2
source

A dplyr solution of dplyr and purrr :

 library(tidyverse) sample %>% mutate(x_plus14 = map(date, ~sum(x == 1 & between(date, . + 1, . + 14))), x_plus30 = map(date, ~sum(x == 1 & between(date, . + 1, . + 30)))) 
  x date x_plus14 x_plus30 1 1 2016-01-01 1 4 2 0 2016-01-05 1 4 3 1 2016-01-07 2 3 4 0 2016-01-12 2 3 5 0 2016-01-16 2 3 6 1 2016-01-20 1 1 7 1 2016-01-20 1 1 8 0 2016-01-25 1 1 9 0 2016-01-26 1 1 10 1 2016-01-31 0 0 
+2
source

As already mentioned, it is strange that you do not count the day, and you should avoid naming names of functions (sample). However, the code below reproduces the desired result:

 set.seed(1453); x = sample(0:1, 10, TRUE) date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20', '2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31') sample = data.frame(x = x, date = as.Date(sample$date)) getOccurences <- function(one_row, sample_data, date_range){ one_date <- as.Date(one_row[2]) sum(sample$x[sample_data$date > one_date & sample_data$date < one_date + date_range]) } sample$x_plus14 <- apply(sample,1,getOccurences, sample, 14) sample$x_plus30 <- apply(sample,1,getOccurences, sample, 30) sample x date x_plus14 x_plus30 1 1 2016-01-01 1 3 2 0 2016-01-05 1 4 3 1 2016-01-07 2 3 4 0 2016-01-12 2 3 5 0 2016-01-16 2 3 6 1 2016-01-20 1 1 7 1 2016-01-20 1 1 8 0 2016-01-25 1 1 9 0 2016-01-26 1 1 10 1 2016-01-31 0 0 
+1
source

Source: https://habr.com/ru/post/1262996/


All Articles