The number of non-NA entries by column grouped

Question

The number of non-NA entries by column grouped

I have data.table that looks something like this:

> dt <- data.table(
  group1 = c("a", "a", "a", "b", "b", "b", "b"),
  group2 = c("x", "x", "y", "y", "z", "z", "z"),
  data1 = c(NA, rep(T, 3), rep(F, 2), "sometimes"),
  data2 = c("sometimes", rep(F,3), rep(T,2), NA))

> dt

   group1 group2     data1     data2
1:      a      x        NA sometimes
2:      a      x      TRUE     FALSE
3:      a      y      TRUE     FALSE
4:      b      y      TRUE     FALSE
5:      b      z     FALSE      TRUE
6:      b      z     FALSE      TRUE
7:      b      z sometimes        NA

My goal is to find the number of non-NA entries in each data column, grouped by group1and group2.

   group1 group2     data1     data2
1:      a      x         1         2
3:      a      y         1         1
4:      b      y         1         1
5:      b      z         3         2

I have this code left over from working with another part of the data set that did not have NAand was logical:

dt[
  ,
  lapply(.SD, sum),
  by = list(group1, group2),
  .SDcols = c("data3", "data4")
]

But it will not work with NA values or illogical values.

+4

r data.table

Chris Nov 11 '15 at 23:15

source share

3 answers

dt[, lapply(.SD, function(x) sum(!is.na(x))), by = .(group1, group2)]
#   group1 group2 data1 data2
#1:      a      x     1     2
#2:      a      y     1     1
#3:      b      y     1     1
#4:      b      z     3     2

+8

eddi Nov 11 '15 at 23:23

source share

dplyr ( ):

library(dplyr)
dt %>% group_by(group1, group2) %>% summarise_each(funs(sum(!is.na(.))))
Source: local data table [4 x 4]
Groups: group1

  group1 group2 data1 data2
1      a      x     1     2
2      a      y     1     1
3      b      y     1     1
4      b      z     3     2

+3

DatamineR 11 . '15 23:24

David Arenburg · Accepted Answer · 2015-11-11T23:33:54+0000

Another option is melt/ dcastto avoid column operation. This will remove NAsand use the lengthdefault function

dcast(melt(dt, id = c("group1", "group2"), na.rm = TRUE), group1 + group2 ~ variable) 
# Aggregate function missing, defaulting to 'length'
#    group1 group2 data1 data2
# 1:      a      x     1     2
# 2:      a      y     1     1
# 3:      b      y     1     1
# 4:      b      z     3     2

The number of non-NA entries by column grouped

More articles: