aggregate uses tapply , which in turn uses factor in its grouping variable.
But look what happens to the NA values in factor :
factor(c(1, 2, NA)) # [1] 1 2 <NA> # Levels: 1 2
Pay attention to the levels . You can use addNA to save NA :
addNA(factor(c(1, 2, NA))) # [1] 1 2 <NA> # Levels: 1 2 <NA>
So you probably need to do something like:
aggregate(y ~ addNA(x), d, sum) # addNA(x) y # 1 1 2 # 2 <NA> 3
Or something like:
d$x <- addNA(factor(d$x)) str(d)
(Alternatively, upgrade to approximately “data.table”, which will not only be faster than aggregate , but also give you more consistent behavior with NA values. Whether you use the aggregate formula method or not .)
library(data.table) as.data.table(d)[, sum(y), by = x] # x V1 # 1: 1 2 # 2: NA 3
source share