Unique on a data frame with selected columns only

I have a dataframe s> 100 columns, and I would find unique rows by comparing only two columns. I hope this is not easy, but I cannot get it to work with unique or duplicated myself.

In below, I would like to uniquely use only id and id2:

 data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) id id2 somevalue 1 1 x 1 1 y 3 4 z 

I would like to get either:

 id id2 somevalue 1 1 x 3 4 z 

or

 id id2 somevalue 1 1 y 3 4 z 

(I have no preference which of the unique strings is saved)

+50
r unique
Mar 30 2018-12-12T00:
source share
4 answers

Well, if it doesn’t matter which value in the selected column is not duplicated, it should be pretty simple:

 dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) > dat[!duplicated(dat[,c('id','id2')]),] id id2 somevalue 1 1 1 x 3 3 4 z 

Inside the duplicated call, I just pass only those columns from dat that I don't want to duplicate. This code will always automatically select the first of any ambiguous values. (In this case, x.)

+87
Mar 30 '12 at 14:38
source share

Using unique() :

 dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) dat[row.names(unique(dat[,c("id", "id2")])),] 
+9
Oct 22 '15 at 18:35
source share

Here are a couple of dplyr options that contain non-dual strings based on id and id2 column identifiers:

 library(dplyr) df %>% distinct(id, id2, .keep_all = TRUE) df %>% group_by(id, id2) %>% filter(row_number() == 1) df %>% group_by(id, id2) %>% slice(1) 
+3
Jul 17 '18 at 18:37
source share

Minor update in @Joran code.
Using the code below, you can avoid ambiguity and get only two columns:

 dat <- data.frame(id=c(1,1,3), id2=c(1,1,4) ,somevalue=c("x","y","z")) dat[row.names(unique(dat[,c("id", "id2")])), c("id", "id2")] 
0
Jul 10 '18 at 15:13
source share



All Articles