Delete rows with NA in the group, given that the group contains at least one non NA value

Data examples

df = structure(list(class = structure(c(4L, 1L, 1L, 3L, 2L), .Label = c("apple", 
"berry", "grape", "orange"), class = "factor"), value = c(NA, 
NA, 1, 1, NA)), .Names = c("class", "value"), row.names = c(NA, 
-5L), class = "data.frame")

looks like

   class value
1 orange    NA
2  apple    NA
3  apple     1
4  grape     1
5  berry    NA

How to delete a row with NA in a group only if the group has a different non NA value

desired output

   class value
1 orange    NA
2  apple     1
3  grape     1
4  berry    NA

This can be done in three steps using a subset and a merge. I'm interested in the approachdata.table

+4
source share
4 answers

We could use data.table. Convert 'data.frame' to 'data.table' ( setDT(df)). Grouped by "class", we check with the condition if/elsethat the elements "NA" appear in the "value" and a subset of.SD

library(data.table)
setDT(df)[, if(any(!is.na(value))) .SD[!is.na(value)] else .SD , by = class]
#    class value
#1: orange    NA
#2:  apple     1
#3:  grape     1
#4:  berry    NA

Or we can change the condition from anyto allby slightly changing the condition

setDT(df)[, if(all(is.na(value))) .SD else .SD[!is.na(value)], by = class]
#    class value
#1: orange    NA
#2:  apple     1
#3:  grape     1
#4:  berry    NA

(.I), .

indx <- setDT(df)[, if(any(!is.na(value))) .I[!is.na(value)] else .I, class]$V1
df[indx]
+2

dplyr. , :

df %>%
    group_by(class) %>%
    filter(!(is.na(value) & sum(!is.na(value)) > 0)) %>%
    ungroup

ungroup , (dplyr:: tbl, ).

+4

Here is a different approach data.table:

setkey(df,class)
df[!is.na(value)][J(unique(df$class))]

#     class value
# 1:  apple     1
# 2:  berry    NA
# 3:  grape     1
# 4: orange    NA

This is thanks to the default action nomatch=NA. Enter ?data.tablethe console for details.

+2
source

You can create a temporary variable of all classes with NA, then take out all NA and add back all classes that have been completely removed.

df<-setDT(df)
temp<-df[is.na(value),list(class=unique(class), value)]
df<-df[!is.na(value)]
df<-rbindlist(list(df, temp[!class %in% df[,class]]))
rm(temp)
+1
source

Source: https://habr.com/ru/post/1598280/