Removing duplicates and small vectors from the list

I have a list of vectors, say:

li <- list( c(1, 2, 3), c(1, 2, 3, 4), c(2, 3, 4), c(5, 6, 7, 8, 9, 10, 11, 12), numeric(0), c(5, 6, 7, 8, 9, 10, 11, 12, 13) ) 

And I would like to remove all vectors that are already contained in others (greater than or equal to), as well as all empty vectors

In this case, I will only have a list

 1 2 3 4 5 6 7 8 9 10 11 12 13 

Is there any useful feature to achieve this?

Thank you in advance

+6
source share
2 answers

First, you must sort the list by the length of the vector, so that in the excision cycle it is guaranteed that each vector of the lower index is shorter than each vector with a higher index, so you need a one-way setdiff() .

 l <- list(1:3, 1:4, 2:4, 5:12, double(), 5:13 ); ls <- l[order(sapply(l,length))]; i <- 1; while (i <= length(ls)-1) if (length(ls[[i]]) == 0 || any(sapply((i+1):length(ls),function(i2) length(setdiff(ls[[i]],ls[[i2]]))) == 0)) ls[[i]] <- NULL else i <- i+1; ls; ## [[1]] ## [1] 1 2 3 4 ## ## [[2]] ## [1] 5 6 7 8 9 10 11 12 13 

Here is a small alternative, replacing any(sapply(...)) with a second while loop. The advantage is that the while loop can be interrupted prematurely if it finds any superset in the rest of the list.

 l <- list(1:3, 1:4, 2:4, 5:12, double(), 5:13 ); ls <- l[order(sapply(l,length))]; i <- 1; while (i <= length(ls)-1) if (length(ls[[i]]) == 0 || { j <- i+1; res <- F; while (j <= length(ls)) if (length(setdiff(ls[[i]],ls[[j]])) == 0) { res <- T; break; } else j <- j+1; res; }) ls[[i]] <- NULL else i <- i+1; ls; ## [[1]] ## [1] 1 2 3 4 ## ## [[2]] ## [1] 5 6 7 8 9 10 11 12 13 
+2
source

x is contained in y if

 length(setdiff(x, y)) == 0 

You can apply it to each pair of vectors using functions such as expand.grid or combn.

0
source

Source: https://habr.com/ru/post/989216/


All Articles