Together with na.action = na.pass gives an unexpected answer

As an example, I use the following data.frame file:

d <- data.frame(x=c(1,NA), y=c(2,3)) 

I would like to summarize the y values ​​of the variable x. Since there is no common value of x, I would expect aggregation to just give me the original data.frame back, where NA is treated as a group. But aggregation gives me the following results.

 >aggregate(y ~ x, data=d, FUN=sum) xy 1 1 2 

I read the documentation about changing na.action default actions, but it doesn't seem to give me anything meaningful.

 >aggregate(y ~ x, data=d, FUN=sum, na.action=na.pass) xy 1 1 2 

What's happening? It seems like I don't understand what na.pass does in this case. Is it possible to accomplish what I want in R? Any help would be greatly appreciated.

+5
source share
1 answer

aggregate uses tapply , which in turn uses factor in its grouping variable.

But look what happens to the NA values ​​in factor :

 factor(c(1, 2, NA)) # [1] 1 2 <NA> # Levels: 1 2 

Pay attention to the levels . You can use addNA to save NA :

 addNA(factor(c(1, 2, NA))) # [1] 1 2 <NA> # Levels: 1 2 <NA> 

So you probably need to do something like:

 aggregate(y ~ addNA(x), d, sum) # addNA(x) y # 1 1 2 # 2 <NA> 3 

Or something like:

 d$x <- addNA(factor(d$x)) str(d) # 'data.frame': 2 obs. of 2 variables: # $ x: Factor w/ 2 levels "1",NA: 1 2 # $ y: num 2 3 aggregate(y ~ x, d, sum) # xy # 1 1 2 # 2 <NA> 3 

(Alternatively, upgrade to approximately “data.table”, which will not only be faster than aggregate , but also give you more consistent behavior with NA values. Whether you use the aggregate formula method or not .)

 library(data.table) as.data.table(d)[, sum(y), by = x] # x V1 # 1: 1 2 # 2: NA 3 
+7
source

Source: https://habr.com/ru/post/1236244/


All Articles