Doubled levels in factors will be banned April 2017. How about levels?

On the R-devel list, Martin Mather has posted a report on duplicate factor levels.

"factors with untrue (duplicated) levels are outdated since 2009 - now they are more out of date ..." June 4, 2016

It states that in R 3.4, scheduled for April 2017, duplicated levels will cause an error, not just a warning.

I wonder why the level functions do not cause a similar warning? Here I combine the first three levels as "a" in two ways, one of which is not recommended.

Example

> x <- c("a", "b", "c", "d")
> xf <- factor(x, levels = c("a", "b", "c", "d"), 
    labels = c("a", "a", "a", "d"))
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) 
    as.character(labels) else paste0(labels,  :
    duplicated levels in factors are deprecated
> xf <- factor(x)
> levels(xf) <- c("a", "a", "a", "d")
> xf
[1] a a a d
Levels: a d

I would like to understand why the latter is interpreted differently by R than the former.

, . ? , . , .

## combine some levels
z <- gl(3, 2, 12, labels = c("apple", "salad", "orange"))
z
levels(z) <- c("fruit", "veg", "fruit")
z
+4
1

. Levels . . . .

. unclass . . , , 1. , 1.

x <- c(letters[1:3], letters[1:3])
xf <- factor(x)

xf
# [1] a b c a b c
# Levels: a b c

attributes(xf)
# $levels
# [1] "a" "b" "c" 
# 
# $class
# [1] "factor"

unclass(xf)
# [1] 1 2 3 1 2 3
# attr(,"levels")
# [1] "a" "b" "c"

, NA.

factor(c("a", "b", "c"), levels = c("e", "f", "g"))
# [1] <NA> <NA> <NA>
#   Levels: e f g

labels - , . levels, . , "e" "h".

factor(c("a", "b", "e"), levels = c("e", "f", "g"), labels = c("h", "i", "j"))
# [1] <NA> <NA> h   
# Levels: h i j

levels() - , , -. , levels(), -. .

xf
# [1] a b c a b c
# Levels: a b c

"a" "e", "b" "f", "c" "g". , .

levels(xf) <- c("e", "f", "g", "e", "f", "g")
> xf
# [1] e f g e f g
# Levels: e f g

: , xf. , unclass(xf).

levels(xf) <- c("m", "m", "m", "n", "n", "n")
xf
# [1] m m m m m m
# Levels: m n
0

Source: https://habr.com/ru/post/1652088/


All Articles