Deduplicative factor levels

Suppose I have this object, which is the dput () form of an invalid coefficient (for example, printing will complain about duplicate level 3):

x <- structure(c(1L, 2L, 3L, 4L), .Label = c("A", "B", "A", "C"),
               class = "factor")

What is the best way using only the base R to convert it to a real factor

structure(c(1L, 2L, 1L, 3L), .Label = c("A", "B", "C"), class = "factor")

I managed to come up with

factor(levels(x)[x])

but I'm not sure that this will continue to work in the future without warning, and it is probably also very inefficient (the object of the real factor that I am trying to recover is huge).

+4
source share
1 answer

Your method seems good and quite effective. To experiment, I created a function to make such distorted factors:

bad.factor <- function(nums,labs){
  structure(nums, .Label = labs, class = "factor")}

If you use:

x <- bad.factor(1:1000000,gtools::chr(runif(1000000,65,90)))

Then run:

microbenchmark::microbenchmark(factor(levels(x)[x]))

Typical Output:

 Unit: milliseconds
                 expr      min       lq     mean   median       uq      max neval
 factor(levels(x)[x]) 27.72593 32.98346 42.97813 34.11871 35.70919 105.3564   100
0
source

Source: https://habr.com/ru/post/1693592/


All Articles