How to filter data.frame with a coefficient that includes NA as a level

If you have data.frame with factors that do not include NA as layers, you can filter your data without any problems.

 set.seed(123) df=data.frame(a = factor(as.character(c(1, 1, 2, 2, 3, NA,3,NA)),exclude=NULL), b= runif(8)) #str(df) df[df$a==3,] # ab # 5 3 0.9404673 # 7 3 0.5281055 

Problems arise if you need to filter by NA level. The following does not work:

 df[df$a==NA,] df[df$a=="NA",] df[is.na(df$a),] 

The only way I found is to convert the coefficient to a numeric one and compare it with the number of levels.

 df[as.numeric(df$a)==4,] # ab #6 <NA> 0.0455565 #8 <NA> 0.8924190 

Is there another intuitive / elegant way to get the same result?

+5
source share
3 answers

Check if the corresponding df$a na levels match:

 df[is.na(levels(df$a)[df$a]),] ab 6 <NA> 0.1649003 8 <NA> 0.6556045 

As Frank noted, this also includes observations where the value of df$a , and not just the level, NA . I guess the original poster wanted to include these cases. If not, you can do something like

 x <- factor(c("A","B", NA), levels=c("A", NA), exclude = NULL) i <- which(is.na(levels(x)[x])) i[!is.na(x[i])] 

gives you 3 , only NA level, leaving an unknown level (B).

+5
source

If you also have true missing values ​​(which are not related to factor levels) ...

 DF = data.frame( x = factor(c("A", "B", NA), levels=c("A", NA), exclude=NULL), v = 1:3 ) 

Line 3 x is NA , while line 2 is a true missing value.

To get only row 3, you can make a connection to data.table ...

 library(data.table) setDT(DF) merge(DF, data.table(x = factor(NA_character_, exclude=NULL))) # or DF[.(factor(NA_character_, exclude=NULL)), on=.(x), nomatch=0] # xv # 1: NA 3 

Or somewhat more inconvenient in dplyr:

 dplyr::right_join(DF, data.frame(x = factor(NA_character_, levels=levels(DF$x), exclude=NULL))) # Joining, by = "x" # xv # 1 <NA> 3 

I could not find any way to get here, except for the crazy ...

 wv = which(is.na(levels(DF$x))) DF[ !is.na(DF$x) & as.integer(DF$x) == wv, ] # xv # 3 <NA> 3 
+3
source

I agree that it is a little strange that is.na() does not respond to factors. But it works:

 set.seed(123) df=data.frame(a = factor(as.character(c(1, 1, 2, 2, 3, NA,3,NA)),exclude=NULL), b= runif(8)) df[is.na(as.character(df$a)),] 
+1
source

Source: https://habr.com/ru/post/1272105/


All Articles