Data.table auto remove NA in in for middle function?

Today I found an error in my program due to data.table auto remove NA for mean

eg:

 > a<-data.table(a=c(NA,NA,FALSE,FALSE), b=c(1,1,2,2)) > a > a[,list(mean(a), sum(a)),by=b] b V1 V2 1: 1 0 NA // Why V1 = 0 here? I had expected NA 2: 2 0 0 > mean(c(NA,NA,FALSE,FALSE)) [1] NA > mean(c(NA,NA)) [1] NA > mean(c(FALSE,FALSE)) [1] 0 

Is this the intended behavior?

+4
source share
1 answer

This is not intended. Looks like an optimization issue ...

 > a[,list(mean(a), sum(a)),by=b] b V1 V2 1: 1 0 NA 2: 2 0 0 > options(datatable.optimize=FALSE) > a[,list(mean(a), sum(a)),by=b] b V1 V2 1: 1 NA NA 2: 2 0 0 > 

Investigated and fixed in version 1.8.9, which will soon be included in CRAN. From NEWS :

mean () in j was optimized with v1.8.2, but did not comply with na.rm = TRUE (default value). Many thanks to Colin Fang for the message. Added test.

A new feature in version 8.1 was:

mean () is now automatically optimized, # 1231. This can speed up grouping by 20 times when there are a large number of groups. See wiki point 3 , which you no longer need to know about. Turn off optimization by setting parameters (datatable.optimize = 0).

+4
source

Source: https://habr.com/ru/post/1500126/


All Articles