Select multiple duplicate rows [AB & BA]

I have an input matrix input

df <- data.frame(a = c(1,1,2,4,3,5,2,1,1,3), b = c(4,3,3,1,2,2,4,4,4,2), d = LETTERS[1:10])

I want to receive

out <- data.frame(a = c(1,2,4,3,1,1,3), b = c(4,3,1,2,4,4,2), d = c(A,C,D,E,H,I,J))

#   a b d
# 1 1 4 A
# 2 2 3 C
# 3 4 1 D
# 4 3 2 E
# 5 1 4 H
# 6 1 4 I
# 7 3 2 J

I want to extract any rows that are duplicated in both columns - also in reverse order

I tried df[duplicated(df[c("a")]) | duplicated(df[c("b")]) ,], but it does not work.

Any suggestion?

+4
source share
3 answers

You can be grouped frame data sorted by the sorting columns aand bthrough pminand pmaxthe column aand bthen filtered based on a condition in which the group contains at least two rows:

library(dplyr)
df %>% 
       group_by(pmin(a,b), pmax(a,b)) %>% 
       filter(n() >= 2) %>% 
       ungroup() %>% 
       select(a,b,d)

# Source: local data frame [7 x 3]
# 
#       a     b      d
#   <dbl> <dbl> <fctr>
# 1     1     4      A
# 2     2     3      C
# 3     4     1      D
# 4     3     2      E
# 5     1     4      H
# 6     1     4      I
# 7     3     2      J
+2
source

In the R database, you can use duplicatedalong with apply:

df[(duplicated(df$a)&duplicated(df$b))|
   apply(df,1, function(l) sum((l[["a"]]==df$b)&(l[["b"]]==df$a))>0),]

   a b d
1  1 4 A
3  2 3 C
4  4 1 D
5  3 2 E
7  2 4 G
8  1 4 H
9  1 4 I
10 3 2 J
+1
source

You can also use paired max and min (pmax and pmin) to redefine the order, and then find duplicate rows from the first and last and combine the two results. Although this is a long solution, it may be of interest:

df <- data.frame(a = c(1,1,2,4,3,5,2,1,1,3), b = c(4,3,3,1,2,2,4,4,4,2), d = LETTERS[1:10])

out <- data.frame(a = c(1,2,4,3,1,1,3), b = c(4,3,1,2,4,4,2), d = c('A','C','D','E','H','I','J'))    


mx<- with (df, pmax(a,b))
mn<- with (df, pmin(a,b))

df2<- data.frame(mx, mn)
df2

a<- df[duplicated(df2),]
b<- df[duplicated(df2,fromLast = T),]

res<- merge(a,b,all = T)
res<- res[order(res$d),]

res 
out

#check
sum (as.character(res$d) !=as.character(out$d) )
0
source

Source: https://habr.com/ru/post/1655406/


All Articles