Data.table 1.8.11 and problems with aggregation

Question

Data.table 1.8.11 and problems with aggregation

UPDATE : FAST FIXED. See below.

Here is some interesting behavior I found with data.table 1.8.11 (r1101, 2014-01-28). The order of the variables included in the by clause changes the aggregation results:

>   foo = data.table(a=rep(c(0,1,0,1),2), b=rep(c(T,T,F,F),2), c=c(1,1,1,1,1,1,1,1))
>   foo
   a     b c
1: 0  TRUE 1
2: 1  TRUE 1
3: 0 FALSE 1
4: 1 FALSE 1
5: 0  TRUE 1
6: 1  TRUE 1
7: 0 FALSE 1
8: 1 FALSE 1
>   foo[, .N, by=list(b, a)]
       b a N
1:  TRUE 0 1
2:  TRUE 1 1
3: FALSE 0 1
4: FALSE 1 1
5:  TRUE 0 1
6:  TRUE 1 1
7: FALSE 0 1
8: FALSE 1 1
>   foo[, .N, by=list(a, b)]
   a     b N
1: 0  TRUE 2
2: 1  TRUE 2
3: 0 FALSE 2
4: 1 FALSE 2
>

This does not happen in the stable release of data.table (1.8.10).

+4

r data.table

Clayton stanley Jan 29 '14 at 16:55

source share

1 answer

Arun · Accepted Answer · 2014-01-29T22:51:48+0000

Thanks for reporting. This is now fixed in v1.8.11 commit 1103. From NEWS :

o , - fastorder, . # 5307. Clayton Stanley SO: data.table 1.8.11

require(data.table) # commit 1103 v1.8.11
foo[, .N, by=list(b,a)]
       b a N
1:  TRUE 0 2
2:  TRUE 1 2
3: FALSE 0 2
4: FALSE 1 2

foo[, .N, by=list(a,b)]
   a     b N
1: 0  TRUE 2
2: 1  TRUE 2
3: 0 FALSE 2
4: 1 FALSE 2

Data.table 1.8.11 and problems with aggregation

UPDATE : FAST FIXED. See below.

Thanks for reporting. This is now fixed in v1.8.11 commit 1103. From NEWS :

More articles: