I am making a left nonequilibrium connection using data.table:
OUTPUT <- DT2[DT1, on=.(DOB, FORENAME, SURNAME, POSTCODE, START_DATE <= MONTH, EXPIRY_DATE >= MONTH)]
OUTPUTcontains the correct left join, except that the column MONTH(which is present in DT1) is missing.
Is this a bug in data.table?
NB: Of course START_DATE, EXPIRY_DATEand MONTHare in the same YYYY-MM-DD format, IDATE. The merge results are correct based on these nonequivalent criteria. Just the column is missing, and I need to use it in further work.
Change 1 . Simplified reproducible example
DT1 <- structure(list(ID = c(1, 2, 3), FORENAME = c("JOHN", "JACK",
"ROB"), SURNAME = c("JOHNSON", "JACKSON", "ROBINSON"), MONTH = structure(c(16953L,
16953L, 16953L), class = c("IDate", "Date"))), .Names = c("ID",
"FORENAME", "SURNAME", "MONTH"), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
DT2 <- structure(list(CERT_NUMBER = 999, FORENAME = "JOHN", SURNAME = "JOHNSON",
START_DATE = structure(16801L, class = c("IDate", "Date")),
EXPIRY_DATE = structure(17166L, class = c("IDate", "Date"
))), .Names = c("CERT_NUMBER", "FORENAME", "SURNAME", "START_DATE",
"EXPIRY_DATE"), row.names = c(NA, -1L), class = c("data.table",
"data.frame"))
OUTPUT <- DT2[DT1, on=.(FORENAME, SURNAME, START_DATE <= MONTH, EXPIRY_DATE >= MONTH)]
> OUTPUT
CERT_NUMBER FORENAME SURNAME START_DATE EXPIRY_DATE ID
1: 999 JOHN JOHNSON 2016-06-01 2016-06-01 1
2: NA JACK JACKSON 2016-06-01 2016-06-01 2
3: NA ROB ROBINSON 2016-06-01 2016-06-01 3
FORENAMEand SURNAMEconnect and are present in the outlet.MONTH also (not equal) connected and absent at the output.
?
, , MONTH .
, MONTH, DT1. , , , (DT1) , (DT2).
CERT_NUMBER FORENAME SURNAME START_DATE EXPIRY_DATE ID MONTH
1: 999 JOHN JOHNSON 2016-01-01 2016-12-31 1 2016-06-01
2: NA JACK JACKSON <NA> <NA> 2 2016-06-01
3: NA ROB ROBINSON <NA> <NA> 3 2016-06-01
2: -, , , ! 1 1 31 ! - , . 1 .