Transcoding variables in R using a lookup table

I have a question about transcoding data. I would like to use a lookup table, and I am wondering how to transcode NA and use an approach similar to% in%.

Sample data:

gender <- c("Female", "Not Disclosed", "Unknown" , "Male", "Male", "Female", NA)
df_gender <- as.data.frame(gender)
df_gender$gender <- as.character(gender)

My first approach to conversion:

df_gender$gender[df_gender$gender == "Female"] <- "F"
df_gender$gender[df_gender$gender == "Male"] <- "M"
df_gender$gender[df_gender$gender %in% c("Unknown", "Not Disclosed", NA)] <- "Missing"

This approach works properly. However, it is tedious when there are many variables and can lead to many lines of code. I would like to use a lookup table such as the other approach I tried:

df_gender2 <- as.data.frame(gender)
df_gender2$gender <- as.character(gender)

gender_lookup <- c(Female = "F", Male = "M", Unknown = "Missing", "Not Disclosed" = "Missing")
df_gender2$gender <- gender_lookup[df_gender2$gender]

It works, but does not re-read NA. Is there a way to combine Undetected and Unknown to set it to None without typing them separately? Secondly, using the lookup table, is there a way to also reset NA to "Missing"?

+1

Source: https://habr.com/ru/post/1688983/


All Articles