Select multiple duplicate rows [AB & BA]

Question

Select multiple duplicate rows [AB & BA]

I have an input matrix input

df <- data.frame(a = c(1,1,2,4,3,5,2,1,1,3), b = c(4,3,3,1,2,2,4,4,4,2), d = LETTERS[1:10])

I want to receive

out <- data.frame(a = c(1,2,4,3,1,1,3), b = c(4,3,1,2,4,4,2), d = c(A,C,D,E,H,I,J))

#   a b d
# 1 1 4 A
# 2 2 3 C
# 3 4 1 D
# 4 3 2 E
# 5 1 4 H
# 6 1 4 I
# 7 3 2 J

I want to extract any rows that are duplicated in both columns - also in reverse order

I tried df[duplicated(df[c("a")]) | duplicated(df[c("b")]) ,], but it does not work.

Any suggestion?

+4

r subset

Effebi 21 sept '16 at 10:55

source share

3 answers

Psidom · Answer 1 · 2016-09-21T23:10:59+0000

You can be grouped frame data sorted by the sorting columns aand bthrough pminand pmaxthe column aand bthen filtered based on a condition in which the group contains at least two rows:

library(dplyr)
df %>% 
       group_by(pmin(a,b), pmax(a,b)) %>% 
       filter(n() >= 2) %>% 
       ungroup() %>% 
       select(a,b,d)

# Source: local data frame [7 x 3]
# 
#       a     b      d
#   <dbl> <dbl> <fctr>
# 1     1     4      A
# 2     2     3      C
# 3     4     1      D
# 4     3     2      E
# 5     1     4      H
# 6     1     4      I
# 7     3     2      J

Hubertl · Answer 2 · 2016-09-21T23:26:31+0000

In the R database, you can use duplicatedalong with apply:

df[(duplicated(df$a)&duplicated(df$b))|
   apply(df,1, function(l) sum((l[["a"]]==df$b)&(l[["b"]]==df$a))>0),]

   a b d
1  1 4 A
3  2 3 C
4  4 1 D
5  3 2 E
7  2 4 G
8  1 4 H
9  1 4 I
10 3 2 J

RS · Answer 3 · 2016-09-22T06:42:00+0000

You can also use paired max and min (pmax and pmin) to redefine the order, and then find duplicate rows from the first and last and combine the two results. Although this is a long solution, it may be of interest:

df <- data.frame(a = c(1,1,2,4,3,5,2,1,1,3), b = c(4,3,3,1,2,2,4,4,4,2), d = LETTERS[1:10])

out <- data.frame(a = c(1,2,4,3,1,1,3), b = c(4,3,1,2,4,4,2), d = c('A','C','D','E','H','I','J'))    


mx<- with (df, pmax(a,b))
mn<- with (df, pmin(a,b))

df2<- data.frame(mx, mn)
df2

a<- df[duplicated(df2),]
b<- df[duplicated(df2,fromLast = T),]

res<- merge(a,b,all = T)
res<- res[order(res$d),]

res 
out

#check
sum (as.character(res$d) !=as.character(out$d) )

Select multiple duplicate rows [AB & BA]

More articles: