Consider data.tables
library(data.table) ## v1.9.7 (dev version)
dt1 <- data.table(id=c(1,2,3),
val=c(3,2,1))
dt2 <- data.table(id=c(1,2,3),
val2=c(100,200,300))
No keys assigned to tables here
tables()
When using two different data.tablejoin operations
dt_merge <- merge(dt1, dt2, by=c("id"))
dt_bracket <- dt1[ dt2, on=c("id")]
We see that the function has mergeassigned a key, but X[Y]does not have
If we then use two data.tables, where the connection columns have different names, it assigns the key as an X-connection column:
dt1 <- data.table(id=c(1,2,3),
val=c(3,2,1))
dt2 <- data.table(id2=c(1,2,3),
val2=c(100,200,300))
dt_merge <- merge(dt1, dt2, by.x=c("id"), by.y=c("id2"))
dt_bracket <- dt1[ dt2, on=c(id = "id2")]
tables()
I could not find an explanation in the PDF FAQ - 1.12 or the CRAN documentation that explains why the key is assigned after merge.
Since this just led me to several challenges unique(), I was wondering if this is the expected behavior?
Update - Solution
, sort
dt_merge_sort <- merge(dt1, dt2, by=c("id"))
dt_merge_notSort <- merge(dt1, dt2, by=c("id"), sort=FALSE)
tables()