Delete rows with NA in the group, given that the group contains at least one non NA value

Question

Delete rows with NA in the group, given that the group contains at least one non NA value

Data examples

df = structure(list(class = structure(c(4L, 1L, 1L, 3L, 2L), .Label = c("apple", 
"berry", "grape", "orange"), class = "factor"), value = c(NA, 
NA, 1, 1, NA)), .Names = c("class", "value"), row.names = c(NA, 
-5L), class = "data.frame")

looks like

   class value
1 orange    NA
2  apple    NA
3  apple     1
4  grape     1
5  berry    NA

How to delete a row with NA in a group only if the group has a different non NA value

desired output

   class value
1 orange    NA
2  apple     1
3  grape     1
4  berry    NA

This can be done in three steps using a subset and a merge. I'm interested in the approachdata.table

+4

r data.table

Veerendra gadekar Jul 16 '15 at 19:31

source share

4 answers

dplyr. , :

df %>%
    group_by(class) %>%
    filter(!(is.na(value) & sum(!is.na(value)) > 0)) %>%
    ungroup

ungroup , (dplyr:: tbl, ).

+4

Felipe Gerard 16 . '15 20:12

Here is a different approach data.table:

setkey(df,class)
df[!is.na(value)][J(unique(df$class))]

#     class value
# 1:  apple     1
# 2:  berry    NA
# 3:  grape     1
# 4: orange    NA

This is thanks to the default action nomatch=NA. Enter ?data.tablethe console for details.

+2

Frank Jul 16 '15 at 20:58

source share

You can create a temporary variable of all classes with NA, then take out all NA and add back all classes that have been completely removed.

df<-setDT(df)
temp<-df[is.na(value),list(class=unique(class), value)]
df<-df[!is.na(value)]
df<-rbindlist(list(df, temp[!class %in% df[,class]]))
rm(temp)

+1

Dean MacGregor Jul 16 '15 at 19:52

source share

akrun · Accepted Answer · 2015-07-16T19:33:53+0000

We could use data.table. Convert 'data.frame' to 'data.table' ( setDT(df)). Grouped by "class", we check with the condition if/elsethat the elements "NA" appear in the "value" and a subset of.SD

library(data.table)
setDT(df)[, if(any(!is.na(value))) .SD[!is.na(value)] else .SD , by = class]
#    class value
#1: orange    NA
#2:  apple     1
#3:  grape     1
#4:  berry    NA

Or we can change the condition from anyto allby slightly changing the condition

setDT(df)[, if(all(is.na(value))) .SD else .SD[!is.na(value)], by = class]
#    class value
#1: orange    NA
#2:  apple     1
#3:  grape     1
#4:  berry    NA

(.I), .

indx <- setDT(df)[, if(any(!is.na(value))) .I[!is.na(value)] else .I, class]$V1
df[indx]

Delete rows with NA in the group, given that the group contains at least one non NA value

More articles: