Is there a performance advantage for multiple data.table assignments in a single expression?

Question

Is there a performance advantage for multiple data.table assignments in a single expression?

The data.tablefollowing results have equivalent results:

dt1 <- data.table(iris)
dt1[, Long.Petal := Petal.Length > mean(Petal.Length)]
dt1[, Wide.Petal := Petal.Width > mean(Petal.Width)]

and

dt2 <- data.table(iris)
dt2[, `:=`(
  Long.Petal = Petal.Length > mean(Petal.Length),
  Wide.Petal = Petal.Width > mean(Petal.Width)
)]

When working with a large data set, is there a performance advantage (in terms of memory or runtime, or both) in the latter form? Or are the invoices minimal and is it just a matter of style and readability?

+4

performance variable-assignment r data.table

shadowtalker Dec 9 '15 at 15:33

source share

2 answers

, . :

set.seed(42)
dt1 <- data.table(x = rnorm(1e7))
dt2 <- copy(dt1)

library(microbenchmark)

microbenchmark({dt1[, y := x < 0]; dt1[, z := x > 0]},
               dt2[,`:=`(
                 y = x < 0,
                 z = x > 0
               )])
#Unit: milliseconds
#                                                   expr      min       lq     mean   median       uq      max neval cld
#{     dt1[, `:=`(y, x < 0)]     dt1[, `:=`(z, x > 0)] } 122.6285 124.0237 143.3914 125.2057 146.0050 305.3609   100  a 
#                      dt2[, `:=`(y = x < 0, z = x > 0)] 153.2545 156.5720 208.5669 178.9714 301.8305 359.2821   100   b

all.equal(dt1, dt2)
#[1] TRUE

+4

Roland 09 . '15 15:52

Arun · Accepted Answer · 2015-12-09T15:45:59+0000

Consideration should be given to: a) calling [.data.tableand b) running the code in [.data.table.

. 100 1000 (, for-loop), .. - [.data.table. , , set() .

, . Rprof(); <your_code>; Rprof(NULL); summaryRprof() , , .

Is there a performance advantage for multiple data.table assignments in a single expression?

More articles: