Shift () in data.table v1.9.6 is slow for many groups

Thank you for implementing the transition in dt1.9.6. When I have many different groups, shift()against expectations is slower than my old code:

library(data.table)
library(microbenchmark)
set.seed(1)
mg <- data.table(expand.grid(year = 2012:2016, id = 1:1000),
                 value = rnorm(5000))
microbenchmark(dt194 = mg[, l1 := c(value[-1], NA), by = .(id)],
           dt196 = mg[, l2 := shift(value, n = 1,
                               type = "lead"), by = .(id)])
## Unit: milliseconds
##   expr      min        lq      mean    median       uq        max  eval
##  dt194  4.93735  5.236034  5.718654  5.623736  5.74395   9.555922   100
##  dt196 83.92612 87.530404 91.700317 90.953947 91.43783 257.473242   100

A detailed script is here: https://github.com/nachti/datatable_test/blob/master/leadtest.R

Did I use it wrong shift()?

Edit: Avoid :=not helping ( @MichaelChirico ):

microbenchmark(dt194 = mg[, c(value[-1], NA), by = id],
               dt196 = mg[, shift(value, n = 1,
                                   type = "lead"), by = id])

## Unit: milliseconds
##   expr       min        lq     mean    median        uq       max neval
##  dt194  5.161973  5.429927  5.78047  5.698263  5.798132  10.42217   100
##  dt196 79.526981 87.914502 92.18144 91.240949 91.896799 266.04031   100

In addition, use :=is part of the task ...

+4
source share

Source: https://habr.com/ru/post/1627164/


All Articles