In R using data.table, how is one subset or query when the criteria field is an integer?

I use the data.table package quite a bit. There are many examples of a subset or query or search (or what you want to name) using binary search, which seems to be much faster than vector scanning. Here is an excerpt from the help file.

DT["a"] # binary search (fast) DT[x=="a"] # vector scan (slow) 

But what happens if the columns you want to search are not a factor (or symbol), but an integer.

 cpt <- c(23456,23456,10000,44555,44555) description <- c("tonsillectomy","tonsillectomy in >12 year old","brain transplant","castration","orchidectomy") cpt.desc <- data.table(cpt,description) setkey(cpt.desc,cpt) cpt.desc[10000,] 

This does not work, because the integer 10000 is interpreted as the 10000th row, which does not exist in this data table.

If we change the syntax, we get what we are looking for.

 cpt.desc[cpt==10000,] 

However, it looks as if it were a slow vector scan method. Is there a binary search function for integers in the data.table package? Thank you pending your help.

+4
source share
1 answer

Try cpt.desc[J(10000)] . Add ,"mult=all" to get all matches.

+4
source

Source: https://habr.com/ru/post/1391714/


All Articles