Here are some features that may be helpful.
The first computes all possible disjoint collections of a list of sets.
I use βassemblyβ instead of βsectionβ because the collection does not necessarily span the universe (that is, the union of all sets).
The algorithm is recursive and works only for a small number of possible collections. This does not necessarily mean that it will not work with a large list of sets, since the function removes intersecting sets at each iteration.
If the code is not clear, ask and I will add comments.
The input must be a named list, and the result will be a collection list, which is a character vector denoting collection names.
DisjointCollectionsNotContainingX <- function(L, branch=character(0), x=numeric(0)) { filter <- vapply(L, function(y) length(intersect(x, y))==0, logical(1)) L <- L[filter] result <- list(branch) for( i in seq_along(L) ) { result <- c(result, Recall(L=L[-(1:i)], branch=c(branch, names(L)[i]), x=union(x, L[[i]]))) } result }
This is just a wrapper to hide auxiliary arguments:
DisjointCollections <- function(L) DisjointCollectionsNotContainingX(L=L)
The following function can be used to check a specific list of collections that are supposedly not overlapping and "maximum."
For each collection, it will check if 1. all sets do not intersect effectively and 2. adding another set results in a non-intersecting collector or existing collection:
ValidateDC <- function(L, DC) { for( collection in DC ) { for( i in seq_along(collection) ) { others <- Reduce(f=union, x=L[collection[-i]]) if( length(intersect(L[collection[i]], others)) > 0 ) return(FALSE) } elements <- Reduce(f=union, x=L[collection]) for( k in seq_along(L) ) if( ! (names(L)[k] %in% collection) ) { if( length(intersect(elements, L[[k]])) == 0 ) { check <- vapply(DC, function(z) setequal(c(collection, names(L)[k]), z), logical(1)) if( ! any(check) ) return(FALSE) } } } TRUE }
Example:
L <- list(A=c(1,2,3), B=c(3,4), C=c(5,6), D=c(6,7,8)) > ValidateDC(L,DisjointCollections(L)) [1] TRUE > DisjointCollections(L) [[1]] character(0) [[2]] [1] "A" [[3]] [1] "A" "C" [[4]] [1] "A" "D" [[5]] [1] "B" [[6]] [1] "B" "C" [[7]] [1] "B" "D" [[8]] [1] "C" [[9]] [1] "D"
Please note that collections containing A and B are not displayed at the same time due to their non-empty intersection. In addition, collections with C and D not displayed at the same time. Others are fine.
Note: an empty character(0) collection is always a valid combination.
After creating all possible disjoint collections, you can apply any filters you want to continue.
EDIT:
I removed the line if( length(L)==0 ) return(list(branch)) from the first function; He's not needed.
Performance. If there is significant overlap between sets, the function works quickly. Example:
set.seed (1)
L <- lapply (1:50, function (.) Sample (x = 1200, size = 20))
names (L) <- c (LETTERS, letters) [1:50]
system.time (DC <- DisjointCollections (L))
Result:
# user system elapsed # 9.91 0.00 9.92
Total number of collections found:
> length(DC) [1] 121791