UPDATE : FAST FIXED. See below.
Here is some interesting behavior I found with data.table 1.8.11 (r1101, 2014-01-28). The order of the variables included in the by clause changes the aggregation results:
> foo = data.table(a=rep(c(0,1,0,1),2), b=rep(c(T,T,F,F),2), c=c(1,1,1,1,1,1,1,1))
> foo
a b c
1: 0 TRUE 1
2: 1 TRUE 1
3: 0 FALSE 1
4: 1 FALSE 1
5: 0 TRUE 1
6: 1 TRUE 1
7: 0 FALSE 1
8: 1 FALSE 1
> foo[, .N, by=list(b, a)]
b a N
1: TRUE 0 1
2: TRUE 1 1
3: FALSE 0 1
4: FALSE 1 1
5: TRUE 0 1
6: TRUE 1 1
7: FALSE 0 1
8: FALSE 1 1
> foo[, .N, by=list(a, b)]
a b N
1: 0 TRUE 2
2: 1 TRUE 2
3: 0 FALSE 2
4: 1 FALSE 2
>
This does not happen in the stable release of data.table (1.8.10).
source
share