R: Why is the ifelse coerce factor equal to an integer?

I am trying to change the values ​​of a variable to NA values ​​if they are not in the vector:

sample <- factor(c('01', '014', '1', '14', '24'))
df <- data.frame(var1 = 1:6, var2 = factor(c('01', '24', 'none', '1', 'unknown', '24')))
df$var2 <- ifelse(df$var2 %in% sample, df$var2, NA)

For some reason, R does not save the initial values ​​of the factor variable, but turns them into a numerical sequence:

> sample <- factor(c('01', '014', '1', '14', '24'))
> df <- data.frame(var1 = 1:6, 
                   var2 = factor(c('01', '24', 'none', '1', 'unknown', '24')))
> class(df$var2)
[1] "factor"
> df
  var1    var2
1    1      01
2    2      24
3    3    none
4    4       1
5    5 unknown
6    6      24
> df$var2 <- ifelse(df$var2 %in% sample, df$var2, NA)
> class(df$var2)
[1] "integer"
> df
  var1 var2
1    1    1
2    2    3
3    3   NA
4    4    2
5    5   NA
6    6    3

Why is this happening and what will be the right way to achieve what I'm trying here?

(I need to use factors, not integers, so as not to confuse “01” and “1”, and my original data set is large, so using factors, not characters, should save me some memory)

+4
source share
1 answer

I think one way to achieve what you are trying to do is to change your factor levels:

levels(df$var2)[!levels(df$var2) %in% sample] <- NA

, , , NA, :

df
  var1 var2
1    1   01
2    2   24
3    3 <NA>
4    4    1
5    5 <NA>
6    6   24

> df$var2
[1] 01   24   <NA> 1    <NA> 24  
Levels: 01 1 24

. , , , :

df$var2[!df$var2 %in% sample] <- NA

> df
  var1 var2
1    1   01
2    2   24
3    3 <NA>
4    4    1
5    5 <NA>
6    6   24


> df$var2
[1] 01   24   <NA> 1    <NA> 24  
Levels: 01 1 24 none unknown

, ifelse , , ifelse . : ifelse() Date

, @tchakravarty, , - if_else dplyr!

+2

Source: https://habr.com/ru/post/1660372/


All Articles