Unique on a data frame with selected columns only

Question

Unique on a data frame with selected columns only

I have a dataframe s> 100 columns, and I would find unique rows by comparing only two columns. I hope this is not easy, but I cannot get it to work with unique or duplicated myself.

In below, I would like to uniquely use only id and id2:

 data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) id id2 somevalue 1 1 x 1 1 y 3 4 z

I would like to get either:

 id id2 somevalue 1 1 x 3 4 z

or

 id id2 somevalue 1 1 y 3 4 z

(I have no preference which of the unique strings is saved)

+50

r unique

Ina Mar 30 2018-12-12T00:

source share

4 answers

Using unique() :

 dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) dat[row.names(unique(dat[,c("id", "id2")])),]

+9

Gary Feng Oct 22 '15 at 18:35

source share

Here are a couple of dplyr options that contain non-dual strings based on id and id2 column identifiers:

 library(dplyr) df %>% distinct(id, id2, .keep_all = TRUE) df %>% group_by(id, id2) %>% filter(row_number() == 1) df %>% group_by(id, id2) %>% slice(1)

+3

sbha Jul 17 '18 at 18:37

source share

Minor update in @Joran code.
Using the code below, you can avoid ambiguity and get only two columns:

 dat <- data.frame(id=c(1,1,3), id2=c(1,1,4) ,somevalue=c("x","y","z")) dat[row.names(unique(dat[,c("id", "id2")])), c("id", "id2")]

0

Vaya Ashish Jul 10 '18 at 15:13

source share

joran · Accepted Answer · 2012-03-30 14:38

Well, if it doesn’t matter which value in the selected column is not duplicated, it should be pretty simple:

 dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) > dat[!duplicated(dat[,c('id','id2')]),] id id2 somevalue 1 1 1 x 3 3 4 z

Inside the duplicated call, I just pass only those columns from dat that I don't want to duplicate. This code will always automatically select the first of any ambiguous values. (In this case, x.)

Unique on a data frame with selected columns only

More articles: