Undefined levels in factor ()

Question

Undefined levels in factor ()

I work with a dataset in R that comes with a codebook that basically tells me what labels should be for different levels of my variable factors. For example, using the codebook, I see that in my variable “Sex” 0 is “Women” and “1” is “Male”. I use this information to appropriately label values in my variables.

However, to my regret, I recently discovered that the codebook is not complete. For example, he tells me one variable that 1s is “Yes” and 2 is “No,” but he does not tell me what 7s, 8s and 9s are that I see in the data. What I would like to do is label this variable as follows (or something like this):

data$variable <- factor(data$variable, levels=c(1, 2, 7, 8, 9), labels=c("Yes", "No", "7", "8", "9"))

Basically, I would like that for all levels that were not specified in the codebook, they should be marked as themselves. The problem I am facing is that several of them are missing from this codebook, and I really would not have to manually look at all the undefined values in my data in order to build the code above for the slave variable. Also, if I just leave these missing levels, R automatically calls them “NA,” which I don't want.

Summary. I am trying to figure out how to use factor () in such a way that instead of marking all unspecified levels as "NA", he calls them as himself.

+4

r na labels r-factor

Rickyb Oct 14 '12 at 18:39

source share

1 answer

Dason · Accepted Answer · 2012-10-14T18:48:50+0000

You can convert levels after creating the factor so that we can use this to our advantage.

 mydat <- c(1, 2, 3,2,3,4,3,2,1,2,4,4,6,5,7,8,9) # convert to factor ignoring code book dat <- factor(mydat) # Create map corresponding to codebook levels mymap <- c("1" = "Yes", "2" = "No") # Figure out which levels are accounted for by codebook id <- levels(dat) %in% names(mymap) # Convert to appropriate values levels(dat)[id] <- mymap[levels(dat)[id]]

Alternatively (and probably a little easier)

 # alternatively we can construct the map if we have two vectors # of the value and the codebook value val <- c(1, 2) lev <- c("Yes", "No") dat <- factor(mydat) levels(dat)[val] <- lev

Undefined levels in factor ()

More articles: