Using dplyr summaryise_each () with is.na ()

Question

Using dplyr summaryise_each () with is.na ()

I am trying to wrap some dplyr magic inside a function to create a data.frame which I then print using xxtable.

The ultimate goal is to have a dplyr version of this , while working and browsing, I come across a very useful summarise_each() function, which after a subset with regroup() (since this is part of the function), I can then use for analysis all columns.

The problem I ran into (so far) is calling is.na() from summarise_each(funs(is.na)) , as I was told Error: expecting a single value .

I do not purposefully place my function only, but a minimal example follows (NB - In this case, group_by() , while in my function I replace it with regroup() ) ...

 library(dplyr) library(magrittr) > t <- data.frame(grp = rbinom(10, 1, 0.5), a = as.factor(round(rnorm(10))), b = rnorm(10), c = rnorm(10)) t %>% group_by(grp) %>% ## This is replaced with regroup() in my function summarise_each(funs(is.na)) Error: expecting a single value

This is not done, and calling is.na() is a problem, because if I work out the number of observations in each instead (necessary to get the proportion of absent ones), it works ...

 > t %>% group_by(grp) %>% ## This is replaced with regroup() in my function summarise_each(funs(length)) Source: local data frame [2 x 4] grp abc 1 0 8 8 8 2 1 2 2 2

The real problem is that I don't only need is.na() in each column, but sum(is.na()) according to a related example, so I really would like to ...

 > t %>% group_by(grp) %>% ## This is replaced with regroup() in my function summarise_each(funs(propmiss = sum(is.na) / length))

But the problem is that sum(is.na) does not work as I expect (probably because my expectation is wrong!) ...

 > t %>% group_by(grp) %>% ## This is replaced with regroup() in my function summarise_each(funs(nmiss = sum(is.na))) Error in sum(.Primitive("is.na")) : invalid 'type' (builtin) of argument

I tried calling is.na() explicitly with parentheses, but this also returns an error ...

 > t %>% + group_by(grp) %>% ## This is replaced with regroup() in my function + summarise_each(funs(nmiss = sum(is.na()))) Error in is.na() : 0 arguments passed to 'is.na' which requires 1

Any advice or pointers to documentation would be greatly appreciated.

Thanks,

slackline

+6

r dplyr

slackline 24 sept '14 at 13:01

source share

1 answer

Henrik · Accepted Answer · 2014-09-24T13:53:08+0000

Here's a feature tested on a small dataset with some NA :

 df <- data.frame(a = rep(1:2, each = 3), b = c(1, 1, NA, 1, NA, NA), c = c(1, 1, 1, NA, NA, NA)) df # abc # 1 1 1 1 # 2 1 1 1 # 3 1 NA 1 # 4 2 1 NA # 5 2 NA NA # 6 2 NA NA df %>% group_by(a) %>% summarise_each(funs(sum(is.na(.)) / length(.))) # abc # 1 1 0.3333333 0 # 2 2 0.6666667 1

And since you asked for pointers to documentation:. applies to each piece of data and is used in some examples in ?summarize_each . It is described in the Arguments ?funs as a "dummy parameter" and used by Examples . . also briefly described in the Arguments section ?do : " ... you can use . to link to the current group"

Using dplyr summaryise_each () with is.na ()

More articles: