I am trying to change the values of a variable to NA values if they are not in the vector:
sample <- factor(c('01', '014', '1', '14', '24'))
df <- data.frame(var1 = 1:6, var2 = factor(c('01', '24', 'none', '1', 'unknown', '24')))
df$var2 <- ifelse(df$var2 %in% sample, df$var2, NA)
For some reason, R does not save the initial values of the factor variable, but turns them into a numerical sequence:
> sample <- factor(c('01', '014', '1', '14', '24'))
> df <- data.frame(var1 = 1:6,
var2 = factor(c('01', '24', 'none', '1', 'unknown', '24')))
> class(df$var2)
[1] "factor"
> df
var1 var2
1 1 01
2 2 24
3 3 none
4 4 1
5 5 unknown
6 6 24
> df$var2 <- ifelse(df$var2 %in% sample, df$var2, NA)
> class(df$var2)
[1] "integer"
> df
var1 var2
1 1 1
2 2 3
3 3 NA
4 4 2
5 5 NA
6 6 3
Why is this happening and what will be the right way to achieve what I'm trying here?
(I need to use factors, not integers, so as not to confuse “01” and “1”, and my original data set is large, so using factors, not characters, should save me some memory)