Find all disjoint (non-overlapping) sets from the set of sets

My problem: I need to find all the disjoint (non-overlapping) sets from the set of sets.

Background: I use comparative phylogenetic methods to study the evolution of traits in birds. I have a tree with ~ 300 species. This tree can be divided into subclades (i.e., subtrees). If two subclades do not share the species, they are independent. I am looking for an algorithm (and an implementation of R, if possible) that will find all possible subclass sections, where each subclass has more than 10 taxa, and they are all independent. Each subclass can be considered a set, and when two subclasses are independent (do not share views), these subclades are then disjoint sets.

Hope this is clear and someone can help.

Cheers, Glenn

The following code creates an example data set. Where subclades is a list of all possible linings (sets) from which I would like to select X disjoint sets, where the length of the set is Y.

################################### # Example Dataset ################################### library(ape) library(phangorn) library(TreeSim) library(phytools) ##simulate a tree n.taxa <- 300 tree <- sim.bd.taxa(n.taxa,1,lambda=.5,mu=0)[[1]][[1]] tree$tip.label <- seq(n.taxa) ##extract all monophyletic subclades get.all.subclades <- function(tree){ tmp <- vector("list") nodes <- sort(unique(tree$edge[,1])) i <- 282 for(i in 1:length(nodes)){ x <- Descendants(tree,nodes[i],type="tips")[[1]] tmp[[i]] <- tree$tip.label[x] } tmp } tmp <- get.all.subclades(tree) ##set bounds on the maximum and mininum number of tips of the subclades to include min.subclade.n.tip <- 10 max.subclade.n.tip <- 40 ##function to replace trees of tip length exceeding max and min with NA replace.trees <- function(x, min, max){ if(length(x) >= min & length(x)<= max) x else NA } #apply testNtip across all the subclades tmp2 <- lapply(tmp, replace.trees, min = min.subclade.n.tip, max = max.subclade.n.tip) ##remove elements from list with NA, ##all remaining elements are subclades with number of tips between ##min.subclade.n.tip and max.subclade.n.tip subclades <- tmp2[!is.na(tmp2)] names(subclades) <- seq(length(subclades)) 
+4
source share
2 answers

Here is an example of how you can test each pair of list items for zero overlap by retrieving the indices of all non-overlapping pairs.

 findDisjointPairs <- function(X) { ## Form a 2-column matrix enumerating all pairwise combos of X elements ij <- t(combn(length(X),2)) ## A function that tests for zero overlap between a pair of vectors areDisjoint <- function(i, j) length(intersect(X[[i]], X[[j]])) == 0 ## Use mapply to test for overlap between each pair and extract indices ## of pairs with no matches ij[mapply(areDisjoint, ij[,1], ij[,2]),] } ## Make some reproducible data and test the function on it set.seed(1) A <- replicate(sample(letters, 5), n=5, simplify=FALSE) findDisjointPairs(A) # [,1] [,2] # [1,] 1 2 # [2,] 1 4 # [3,] 1 5 
+1
source

Here are some features that may be helpful.

The first computes all possible disjoint collections of a list of sets.

I use β€œassembly” instead of β€œsection” because the collection does not necessarily span the universe (that is, the union of all sets).

The algorithm is recursive and works only for a small number of possible collections. This does not necessarily mean that it will not work with a large list of sets, since the function removes intersecting sets at each iteration.

If the code is not clear, ask and I will add comments.

The input must be a named list, and the result will be a collection list, which is a character vector denoting collection names.

 DisjointCollectionsNotContainingX <- function(L, branch=character(0), x=numeric(0)) { filter <- vapply(L, function(y) length(intersect(x, y))==0, logical(1)) L <- L[filter] result <- list(branch) for( i in seq_along(L) ) { result <- c(result, Recall(L=L[-(1:i)], branch=c(branch, names(L)[i]), x=union(x, L[[i]]))) } result } 

This is just a wrapper to hide auxiliary arguments:

 DisjointCollections <- function(L) DisjointCollectionsNotContainingX(L=L) 

The following function can be used to check a specific list of collections that are supposedly not overlapping and "maximum."

For each collection, it will check if 1. all sets do not intersect effectively and 2. adding another set results in a non-intersecting collector or existing collection:

 ValidateDC <- function(L, DC) { for( collection in DC ) { for( i in seq_along(collection) ) { others <- Reduce(f=union, x=L[collection[-i]]) if( length(intersect(L[collection[i]], others)) > 0 ) return(FALSE) } elements <- Reduce(f=union, x=L[collection]) for( k in seq_along(L) ) if( ! (names(L)[k] %in% collection) ) { if( length(intersect(elements, L[[k]])) == 0 ) { check <- vapply(DC, function(z) setequal(c(collection, names(L)[k]), z), logical(1)) if( ! any(check) ) return(FALSE) } } } TRUE } 

Example:

 L <- list(A=c(1,2,3), B=c(3,4), C=c(5,6), D=c(6,7,8)) > ValidateDC(L,DisjointCollections(L)) [1] TRUE > DisjointCollections(L) [[1]] character(0) [[2]] [1] "A" [[3]] [1] "A" "C" [[4]] [1] "A" "D" [[5]] [1] "B" [[6]] [1] "B" "C" [[7]] [1] "B" "D" [[8]] [1] "C" [[9]] [1] "D" 

Please note that collections containing A and B are not displayed at the same time due to their non-empty intersection. In addition, collections with C and D not displayed at the same time. Others are fine.

Note: an empty character(0) collection is always a valid combination.

After creating all possible disjoint collections, you can apply any filters you want to continue.


EDIT:

  • I removed the line if( length(L)==0 ) return(list(branch)) from the first function; He's not needed.

  • Performance. If there is significant overlap between sets, the function works quickly. Example:

    set.seed (1)

    L <- lapply (1:50, function (.) Sample (x = 1200, size = 20))

    names (L) <- c (LETTERS, letters) [1:50]

    system.time (DC <- DisjointCollections (L))

Result:

 # user system elapsed # 9.91 0.00 9.92 

Total number of collections found:

 > length(DC) [1] 121791 
+1
source

Source: https://habr.com/ru/post/1481997/


All Articles