R - data.table different behavior with keys after merging and X [Y] merges

Consider data.tables

library(data.table)   ## v1.9.7 (dev version)
#  rm(list=ls())
dt1 <- data.table(id=c(1,2,3),
                val=c(3,2,1))

dt2 <- data.table(id=c(1,2,3),
                val2=c(100,200,300))

No keys assigned to tables here

tables()
#     NAME NROW NCOL MB COLS    KEY
#[1,] dt1     3    2  1 id,val     
#[2,] dt2     3    2  1 id,val2    
#Total: 2MB

When using two different data.tablejoin operations

dt_merge <- merge(dt1, dt2, by=c("id"))
dt_bracket <- dt1[ dt2, on=c("id")]

We see that the function has mergeassigned a key, but X[Y]does not have

#     NAME       NROW NCOL MB COLS        KEY
#[1,] dt_bracket    3    3  1 id,val,val2    
#[2,] dt_merge      3    3  1 id,val,val2 id 
#[3,] dt1           3    2  1 id,val         
#[4,] dt2           3    2  1 id,val2               

If we then use two data.tables, where the connection columns have different names, it assigns the key as an X-connection column:

#  rm(list=ls())
dt1 <- data.table(id=c(1,2,3),
                val=c(3,2,1))

dt2 <- data.table(id2=c(1,2,3),
                val2=c(100,200,300))


dt_merge <- merge(dt1, dt2, by.x=c("id"), by.y=c("id2"))
dt_bracket <- dt1[ dt2, on=c(id = "id2")]
tables()
#     NAME       NROW NCOL MB COLS        KEY
#[1,] dt_bracket    3    3  1 id,val,val2    
#[2,] dt_merge      3    3  1 id,val,val2 id 
#[3,] dt1           3    2  1 id,val         
#[4,] dt2           3    2  1 id2,val2       

I could not find an explanation in the PDF FAQ - 1.12 or the CRAN documentation that explains why the key is assigned after merge.

Since this just led me to several challenges unique(), I was wondering if this is the expected behavior?

Update - Solution

, sort

dt_merge_sort <- merge(dt1, dt2, by=c("id"))
dt_merge_notSort <- merge(dt1, dt2, by=c("id"), sort=FALSE)

tables()

#     NAME             NROW NCOL MB COLS        KEY
#[1,] dt1                 3    2  1 id,val         
#[2,] dt2                 3    2  1 id,val2        
#[3,] dt_merge_notSort    3    3  1 id,val,val2    
#[4,] dt_merge_sort       3    3  1 id,val,val2 id 
+4
1

, ?merge.data.table sort:

TRUE ( ), data.table , by/by.x. FALSE, .

+6

Source: https://habr.com/ru/post/1626464/


All Articles