When a subset of rows with a coefficient with equal (==), NA is also included. This does not happen with% to%. This is normal?

Suppose I have a factor A with 3 levels A1, A2, A3 and with NA. Each of them appears in 10 cases, so a total of 40 cases. If i do

subset1 <- df[df$A=="A1",]  
dim(subset1)  # 20, i.e., 10 for A1 and 10 for NA's
summary(subset1$A) # both A1 and NA have non-zero counts
subset2 <- df[df$A %in% c("A1"),] 
dim(subset2)  # 10, as expected
summary(subset2$A) # only A1 has non-zero count

And this is the same whether the class of the variable used for the subset is a factor or an integer. Is this the way peers (and>, <) work? So what should I just stick %in%for factors and always include !is.nawhen using peers? Thank!

+4
source share
2 answers

, == %in% NA - , "%in%"...

# Data...
x <- c("A",NA,"A")

# When NA is encountered NA is returned
# Philosophically correct - who knows if the
# missing value at NA is equal to "A"?!
x=="A"
#[1] TRUE   NA TRUE
x[x=="A"]
#[1] "A" NA  "A"

# When NA is encountered by %in%, FALSE is returned, rather than NA
x %in% "A"
#[1]  TRUE FALSE  TRUE
x[ x %in% "A" ]
#[1] "A" "A"

, ( )...

%in% match,

"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0

match, , , ==

"%in2%" <- function(x,table) match(x, table, nomatch = NA_integer_) > 0
x %in2% "A"
#[1] TRUE   NA TRUE
+5

, ( , ), , R.

, NA, , NA. , , NAs, .

:

x <- 1:10
y <- x
y[4] <- NA
ix1 <- which(x < 5)
ix2 <- which(y < 5)
x[ix1]
y[ix2]

Versus:

x[x < 5]
y[y < 5]

y < 5

- v[logicalCondition] , . ixSelect <- which(logicalCondition). NA, which(logicalCondition | is.na(v)).

0

Source: https://habr.com/ru/post/1543224/


All Articles