In my work, I use several tables (client information, transaction records, etc.). Some of them are very large (millions of lines), I recently switched to the data.table package (thanks to Matthew). However, some of them are quite small (several hundred rows and 4/5 columns) and are called several times. So I started thinking about the overhead [.data.table in retrieving data, not in set () the ting value, as already clearly described in ?set , where, regardless of the size of the table, one element is set in about 2 microseconds (depending from the processor).
However, it does not seem to be equivalent to set to get a value from data.table , knowing the exact row and column. Type looped [.data.table .
library(data.table) library(microbenchmark) m = matrix(1,nrow=100000,ncol=100) DF = as.data.frame(m) DT = as.data.table(m)
The latter method is really the best way to quickly get one item several times. However set even faster
> microbenchmark(set(DT,1L,1L,5L), times=1000) Unit: microseconds expr min lq median uq max neval set(DT, 1L, 1L, 5L) 1.955 1.956 2.444 2.444 24.926 1000
question : if we can set , the value in 2.444 microseconds shouldn't it be possible to get the value in a smaller (or at least similar) amount of time? Thank.
EDIT: adding two more options:
> microbenchmark(`[.data.frame`(DT,3450,1), DT[["V1"]][3450], times=1000) Unit: microseconds expr min lq median uq max neval `[.data.frame`(DT, 3450, 1) 46.428 47.895 48.383 48.872 2165.509 1000 DT[["V1"]][3450] 20.038 21.504 23.459 24.437 116.316 1000
which, unfortunately, are not faster than previous attempts.