Undefined levels in factor ()

I work with a dataset in R that comes with a codebook that basically tells me what labels should be for different levels of my variable factors. For example, using the codebook, I see that in my variable β€œSex” 0 is β€œWomen” and β€œ1” is β€œMale”. I use this information to appropriately label values ​​in my variables.

However, to my regret, I recently discovered that the codebook is not complete. For example, he tells me one variable that 1s is β€œYes” and 2 is β€œNo,” but he does not tell me what 7s, 8s and 9s are that I see in the data. What I would like to do is label this variable as follows (or something like this):

data$variable <- factor(data$variable, levels=c(1, 2, 7, 8, 9), labels=c("Yes", "No", "7", "8", "9")) 

Basically, I would like that for all levels that were not specified in the codebook, they should be marked as themselves. The problem I am facing is that several of them are missing from this codebook, and I really would not have to manually look at all the undefined values ​​in my data in order to build the code above for the slave variable. Also, if I just leave these missing levels, R automatically calls them β€œNA,” which I don't want.

Summary. I am trying to figure out how to use factor () in such a way that instead of marking all unspecified levels as "NA", he calls them as himself.

+4
source share
1 answer

You can convert levels after creating the factor so that we can use this to our advantage.

 mydat <- c(1, 2, 3,2,3,4,3,2,1,2,4,4,6,5,7,8,9) # convert to factor ignoring code book dat <- factor(mydat) # Create map corresponding to codebook levels mymap <- c("1" = "Yes", "2" = "No") # Figure out which levels are accounted for by codebook id <- levels(dat) %in% names(mymap) # Convert to appropriate values levels(dat)[id] <- mymap[levels(dat)[id]] 

Alternatively (and probably a little easier)

 # alternatively we can construct the map if we have two vectors # of the value and the codebook value val <- c(1, 2) lev <- c("Yes", "No") dat <- factor(mydat) levels(dat)[val] <- lev 
+5
source

Source: https://habr.com/ru/post/1439674/


All Articles