As @John mentioned, there are problems with use duplicated. I would add that wrapping data.frame forces all the data into the same data type before it is comparable to duplicated. For example, here is an example data.frame:
df <- data.frame( a = LETTERS[1:3],
b = 1:3,
c = as.character(1:3),
d = LETTERS[1:3],
e = 1:3,
f = 1:3)
df
Note that a column is cvery similar to columns b, eand f, but not identical due to different types (character or number). The solution proposed by @Jubbles will ignore these differences.
identical data.frame. outer:
are.cols.identical <- function(col1, col2) identical(df[,col1], df[,col2])
identical.mat <- outer(colnames(df), colnames(df),
FUN = Vectorize(are.cols.identical))
identical.mat
( , , , .)
library(cluster)
distances <- as.dist(!identical.mat)
tree <- hclust(distances)
cut <- cutree(tree, h = 0.5)
cut
# [1] 1 2 3 1 2 2
split(colnames(df), cut)
# $`1`
# [1] "a" "d"
#
# $`2`
# [1] "b" "e" "f"
#
# $`3`
# [1] "c"
1:, ,
are.cols.identical <- function(col1,col2) isTRUE(all.equal((df[,col1],df[,col2]))
2: , :
cut <- apply(identical.mat, 1, function(x)match(TRUE, x))
split(colnames(df), cut)