Zip code subset (factor levels)

Question

Zip code subset (factor levels)

I have a problem where I want to match the start zip code and the end zip code of a very large survey dataset and put these results in a new data frame. I created an example data frame to use for illustrative purposes.

ID = c(1,2,3,4,5) StartPC = c("AF2 4RE","AF3 5RE","AF1 3DR","AF2 4RE","AF2 4PE") EndPC = c("AF2 4RE","NA","AF2 3DR","AX2 4RE","AF2 4PE") data<-data.frame(ID,StartPC,EndPC) data2 <- subset(data, StartPC==EndPC,na.rm=TRUE)

Using the code above, I want to create a dataframe (data2) that only includes ID numbers in which the start and end zip codes are the same. However, I get the error message:

Error in Ops.factor (StartPC, EndPC): sets of factor levels vary

For output, you only need to have identification numbers 1 and 5 included in the new data table.

+4

r subset

KT_1 Dec 7 '11 at 16:15

source share

1 answer

Spacedman · Accepted Answer · 2011-12-07T16:25:24+0000

It will be because

  Error in Ops.factor(StartPC, EndPC) : level sets of factors are different

Your two columns are factors, not symbols. Factors are categorical variables that are stored as integers and a “levels” lookup table. Comparing them actually compares basic integers, so R is sure that you are comparing factors with the same levels. If not, he decides that you are not feeling well.

So, convert to a symbol:

 > subset(data, as.character(StartPC)==as.character(EndPC),na.rm=TRUE) ID StartPC EndPC 1 1 AF2 4RE AF2 4RE 5 5 AF2 4PE AF2 4PE

both on the fly and make your data frame with characters first, or make sure that both columns are made with the same levels.

Zip code subset (factor levels)

More articles: