Together with na.action = na.pass gives an unexpected answer

Question

Together with na.action = na.pass gives an unexpected answer

As an example, I use the following data.frame file:

d <- data.frame(x=c(1,NA), y=c(2,3))

I would like to summarize the y values of the variable x. Since there is no common value of x, I would expect aggregation to just give me the original data.frame back, where NA is treated as a group. But aggregation gives me the following results.

 >aggregate(y ~ x, data=d, FUN=sum) xy 1 1 2

I read the documentation about changing na.action default actions, but it doesn't seem to give me anything meaningful.

 >aggregate(y ~ x, data=d, FUN=sum, na.action=na.pass) xy 1 1 2

What's happening? It seems like I don't understand what na.pass does in this case. Is it possible to accomplish what I want in R? Any help would be greatly appreciated.

+5

r aggregate na

Sanias Nov 18 '15 at 15:21

source share

1 answer

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2015-11-18T15:38:47+0000

aggregate uses tapply , which in turn uses factor in its grouping variable.

But look what happens to the NA values in factor :

 factor(c(1, 2, NA)) # [1] 1 2 <NA> # Levels: 1 2

Pay attention to the levels . You can use addNA to save NA :

 addNA(factor(c(1, 2, NA))) # [1] 1 2 <NA> # Levels: 1 2 <NA>

So you probably need to do something like:

 aggregate(y ~ addNA(x), d, sum) # addNA(x) y # 1 1 2 # 2 <NA> 3

Or something like:

 d$x <- addNA(factor(d$x)) str(d) # 'data.frame': 2 obs. of 2 variables: # $ x: Factor w/ 2 levels "1",NA: 1 2 # $ y: num 2 3 aggregate(y ~ x, d, sum) # xy # 1 1 2 # 2 <NA> 3

(Alternatively, upgrade to approximately “data.table”, which will not only be faster than aggregate , but also give you more consistent behavior with NA values. Whether you use the aggregate formula method or not .)

 library(data.table) as.data.table(d)[, sum(y), by = x] # x V1 # 1: 1 2 # 2: NA 3

Together with na.action = na.pass gives an unexpected answer

More articles: