Counting events only every X days per subject (in an irregular time series)

I have data where I count episodes of care (e.g. ER visits). The trick is that I can’t count on every visit, because sometimes the second or third visit is actually a continuation of the previous problem. Thus, they gave me a direction for counting visits using a 30-day “clean period” or “waiting period”, so I looked for the first event (VISIT 1) by the patient (minimum date), I consider this event, then apply the rules so that NOT Count any visits that occur within 30 days after the first event. After the 30-day window has passed, I can start searching for the second visit (VISIT 2), count it, and then apply the 30-day black color again (not counting the visits that occur 30 days after visit No. 2) .. wash, rinse, repeat ...

I falsified a very messy solution that requires a lot of babysitting and manual step checking (see below). I have to believe that there is a better way. HELP!

data1 <- structure(list(ID = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), .Label = c("", "patient1", "patient2", "patient3"), class = "factor"), Date = structure(c(14610, 14610, 14627, 14680, 14652, 14660, 14725, 15085, 15086, 14642, 14669, 14732, 14747, 14749), class = "Date"), test = c(1L, 1L, 1L, 2L, 1L, 1L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 2L)), .Names = c("ID", "Date", "test"), class = "data.frame", row.names = c(NA, 14L)) library(doBy) ## create a table of first events step1 <- summaryBy(Date~ID, data = data1, FUN=min) step1$Date30 <- step1$Date.min+30 step2 <- merge(data1, step1, by.x="ID", by.y="ID") ## use an ifelse to essentially remove any events that shouldn't be counted step2$event <- ifelse(as.numeric(step2$Date) >= step2$Date.min & as.numeric(step2$Date) <= step2$Date30, 0, 1) ## basically repeat steps above until I dont capture any more events ## there just has to be a better way data3 <- step2[step2$event==1,] data3<- data3[,1:3] step3 <- summaryBy(Date~ID, data = data3, FUN=min) step3$Date30 <- step3$Date.min+30 step4 <- merge(data3, step3, by.x="ID", by.y="ID") step4$event <- ifelse(as.numeric(step4$Date) >= step4$Date.min & as.numeric(step4$Date) <= step4$Date30, 0, 1) data4 <- step4[step4$event==1,] data4<- data4[,1:3] step5 <- summaryBy(Date~ID, data = data4, FUN=min) step5$Date30 <- step5$Date.min+30 ## then I rbind the "keepers" ## in this case steps 1 and 3 above final <- rbind(step1,step3, step5) ## then reformat final <- final[,1:2] final$Date.min <- as.Date(final$Date.min,origin="1970-01-01") ## again, extremely clumsy, but it works... HELP! :) 
+4
source share
3 answers

This solution is without a loop and uses only the R base. It creates a logical vector ok that selects valid rows data1 .

ave performs the specified anonymous function for each patient separately.

We define a state vector consisting of the current date and the beginning of the period for which other dates are not considered. Each date is represented as.numeric(x) , where x is the date. step accepts the state vector and current date and updates the state vector. Reduce runs it according to the data, and then we only accept results for which the minimum and current dates are the same and for which the current date is not a duplicate.

 step <- function(init, curdate) { c(curdate, if (curdate > init[2] + 30) curdate else init[2]) } ok <- !!ave(as.numeric(data1$Date), paste(data1$ID), FUN = function(d) { x <- do.call("rbind", Reduce(step, d, c(-Inf, 0), acc = TRUE)) x[-1,1] == x[-1,2] & !duplicated(x[-1,1]) }) data1[ok, ] 
+6
source

Since such manipulations are not simple and error prone, I would write a separate function to discard events during the blackout period. The function contains a loop, which basically does what you did manually, while there is nothing to do.

 blackout <- function(dates, period=30) { dates <- sort(dates) while( TRUE ) { spell <- as.numeric(diff(dates)) <= period if(!any(spell)) { return(dates) } i <- which(spell)[1] + 1 dates <- dates[-i] } } # Tests stopifnot( length( blackout( seq.Date(Sys.Date(), Sys.Date()+50, by=1) ) ) == 2 ) stopifnot( length( blackout( seq.Date(Sys.Date(), by=31, length=5) ) ) == 5 ) 

It can be used as follows.

 library(plyr) ddply(data1, "ID", summarize, Date=blackout(Date)) 
+2
source

What about

 do.call('rbind', lapply(split(data1, factor(data1$ID)), function(x) (x <- x[order(x$Date),])[c(T, diff(x$Date) > 30),])) 
+1
source

Source: https://habr.com/ru/post/1402778/


All Articles