Getting a subset of a data frame by searching for records with NA in specific columns

Suppose we had a data frame with such NA values,

 >data ABCD 1 3 NA 4 2 1 3 4 NA 3 3 5 4 2 NA NA 2 NA 4 3 1 1 1 2 

I want to know a general method for getting a subset of data with NA values ​​in C or A So the output should be

 ABCD 1 3 NA 4 NA 3 3 5 4 2 NA NA 

I tried using the subset command like so, subset(data, A==NA | C==NA) , but that didn't work. Any ideas?

+4
source share
2 answers

Here is one of the possibilities:

 # Read your data data <- read.table(text=" ABCD 1 3 NA 4 2 1 3 4 NA 3 3 5 4 2 NA NA 2 NA 4 3 1 1 1 2",header=T,sep="") # Now subset your data subset(data, is.na(C) | is.na(A)) ABCD 1 1 3 NA 4 3 NA 3 3 5 4 4 2 NA NA 
+9
source

A very handy feature for such things is complete.cases . It checks the string for NA , and if any returns FALSE. If no NA, returns TRUE.

So, you need to multiply only two columns of your data, and then use complete.cases(.) And negate and a subset of these rows from the source data as follows:

 # assuming your data is in 'df' df[!complete.cases(df[, c("A", "C")]), ] # ABCD # 1 1 3 NA 4 # 3 NA 3 3 5 # 4 4 2 NA NA 
+12
source

Source: https://habr.com/ru/post/1491192/


All Articles