Duplicate string names in R using as.data.frame ()

In R data frames, row names must be unique.

df <- mtcars
rownames(df) <- rep("duplicate!", nrow(df))
> Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
>   duplicate 'row.names' are not allowed
> In addition: Warning message:
> non-unique value when setting 'row.names': ‘duplicate!’ 

or

df <- data.frame(mtcars, row.names=rep("duplicate!", nrow(mtcars)))
> Error in data.frame(mtcars, row.names = rep("duplicate!", nrow(mtcars))) : 
  duplicate row.names: duplicate!

What is the motivation for the next behavior with as.data.frame()? Is this intentional or a mistake?

m <- as.matrix(mtcars)
rownames(m) <- rep("duplicate!", nrow(m))
df <- as.data.frame(m)

Result:

any(duplicated(rownames(df)))  # == TRUE
nrow(df)  # == 32
length(unique(rownames(df)))  # == 1
df["duplicate!", ]  # returns a single row...
>            mpg cyl disp  hp drat   wt  qsec vs am gear carb
> duplicate!  21   6  160 110  3.9 2.62 16.46  0  1    4    4

(Run with R version 3.4.3 (2017-11-30))

+4
source share
1 answer

Yes, as Martin Plummer confirmed on the official R-devel mailing list ( https://stat.ethz.ch/mailman/listinfo/r-devel/ ) in his own way, this is a mistake, and I will most likely make a change to the source soon fixing this one.

+1
source

Source: https://habr.com/ru/post/1694246/


All Articles