Zip code subset (factor levels)

I have a problem where I want to match the start zip code and the end zip code of a very large survey dataset and put these results in a new data frame. I created an example data frame to use for illustrative purposes.

ID = c(1,2,3,4,5) StartPC = c("AF2 4RE","AF3 5RE","AF1 3DR","AF2 4RE","AF2 4PE") EndPC = c("AF2 4RE","NA","AF2 3DR","AX2 4RE","AF2 4PE") data<-data.frame(ID,StartPC,EndPC) data2 <- subset(data, StartPC==EndPC,na.rm=TRUE) 

Using the code above, I want to create a dataframe (data2) that only includes ID numbers in which the start and end zip codes are the same. However, I get the error message:

Error in Ops.factor (StartPC, EndPC): sets of factor levels vary

For output, you only need to have identification numbers 1 and 5 included in the new data table.

+4
source share
1 answer

It will be because

  Error in Ops.factor(StartPC, EndPC) : level sets of factors are different 

Your two columns are factors, not symbols. Factors are categorical variables that are stored as integers and a β€œlevels” lookup table. Comparing them actually compares basic integers, so R is sure that you are comparing factors with the same levels. If not, he decides that you are not feeling well.

So, convert to a symbol:

 > subset(data, as.character(StartPC)==as.character(EndPC),na.rm=TRUE) ID StartPC EndPC 1 1 AF2 4RE AF2 4RE 5 5 AF2 4PE AF2 4PE 

both on the fly and make your data frame with characters first, or make sure that both columns are made with the same levels.

+8
source

Source: https://habr.com/ru/post/1385151/


All Articles