In preparing the answer to the question dplyr or data.table to calculate the accumulation of time series in R I noticed that I get different results depending on whether the table is updated in-place or returned as a new object. In addition, I get different results when I change the order of columns under conditions of nonequilibrium joining.
I currently have no explanation, perhaps due to a big misunderstanding on my side or a simple coding error.
Please note that in this matter there are explanations of the observed behavior of the relationships data.table. If you have alternative solutions to the main problem, please feel free to send an answer to the original question .
Original question and working answer
The initial question was how to calculate the number of hospitalizations occurring 365 days before this hospitalization (including the actual) for each patient using this data:
library(data.table)
DT0 <- data.table(
patient.id = c(1L, 2L, 1L, 1L, 2L, 2L, 2L),
hospitalization.date = as.Date(c("2013/10/15", "2014/10/15", "2015/7/16", "2016/1/7",
"2015/12/20", "2015/12/25", "2016/2/10")))
setorder(DT0, patient.id, hospitalization.date)
DT0
patient.id hospitalization.date
1: 1 2013-10-15
2: 1 2015-07-16
3: 1 2016-01-07
4: 2 2014-10-15
5: 2 2015-12-20
6: 2 2015-12-25
7: 2 2016-02-10
The code below gives the expected answer (additional extra column added for clarity)
DT0[, start.date := hospitalization.date - 365][
, end.date := hospitalization.date][]
DT0
patient.id hospitalization.date start.date end.date
1: 1 2013-10-15 2012-10-15 2013-10-15
2: 1 2015-07-16 2014-07-16 2015-07-16
3: 1 2016-01-07 2015-01-07 2016-01-07
4: 2 2014-10-15 2013-10-15 2014-10-15
5: 2 2015-12-20 2014-12-20 2015-12-20
6: 2 2015-12-25 2014-12-25 2015-12-25
7: 2 2016-02-10 2015-02-10 2016-02-10
result <- DT0[DT0, on = c("patient.id", "hospitalization.date>=start.date",
"hospitalization.date<=end.date"),
.(hospitalizations.last.year = .N), by = .EACHI][]
result
patient.id hospitalization.date hospitalization.date hospitalizations.last.year
1: 1 2012-10-15 2013-10-15 1
2: 1 2014-07-16 2015-07-16 1
3: 1 2015-01-07 2016-01-07 2
4: 2 2013-10-15 2014-10-15 1
5: 2 2014-12-20 2015-12-20 1
6: 2 2014-12-25 2015-12-25 2
7: 2 2015-02-10 2016-02-10 3
with the exception of renamed and duplicated column names (which are left as a comparison).
For the patient.id == 2result, the last line is 3, because the patient was hospitalized in 2016-02-10 for the third time from 2015 to 02-10.
result - data.table, . data.table , :
DT <- copy(DT0)
DT[DT, on = c("patient.id", "hospitalization.date>=start.date",
"hospitalization.date<=end.date"),
hospitalizations.last.year := .N, by = .EACHI]
DT
patient.id hospitalization.date start.date end.date hospitalizations.last.year
1: 1 2013-10-15 2012-10-15 2013-10-15 1
2: 1 2015-07-16 2014-07-16 2015-07-16 2
3: 1 2016-01-07 2015-01-07 2016-01-07 2
4: 2 2014-10-15 2013-10-15 2014-10-15 1
5: 2 2015-12-20 2014-12-20 2015-12-20 3
6: 2 2015-12-25 2014-12-25 2015-12-25 3
7: 2 2016-02-10 2015-02-10 2016-02-10 3
DT , 5 6 3 1 2, . , .
.
:
result <- DT0[DT0, on = c("patient.id", "start.date<=hospitalization.date",
"end.date>=hospitalization.date"),
.(hospitalizations.last.year = .N), by = .EACHI][]
result
, "start.date<=hospitalization.date" "hospitalization.date>=start.date" ( , < >),
patient.id start.date end.date hospitalizations.last.year
1: 1 2013-10-15 2013-10-15 1
2: 1 2015-07-16 2015-07-16 2
3: 1 2016-01-07 2016-01-07 1
4: 2 2014-10-15 2014-10-15 1
5: 2 2015-12-20 2015-12-20 3
6: 2 2015-12-25 2015-12-25 2
7: 2 2016-02-10 2016-02-10 1
. ,
, ( ):
DT <- copy(DT0)
DT[DT, on = c("patient.id", "start.date<=hospitalization.date",
"end.date>=hospitalization.date"),
hospitalizations.last.year := .N, by = .EACHI]
DT
patient.id hospitalization.date start.date end.date hospitalizations.last.year
1: 1 2013-10-15 2012-10-15 2013-10-15 1
2: 1 2015-07-16 2014-07-16 2015-07-16 2
3: 1 2016-01-07 2015-01-07 2016-01-07 1
4: 2 2014-10-15 2013-10-15 2014-10-15 1
5: 2 2015-12-20 2014-12-20 2015-12-20 3
6: 2 2015-12-25 2014-12-25 2015-12-25 2
7: 2 2016-02-10 2015-02-10 2016-02-10 1
, github.
x. .