At first I misunderstood the complexity of the problem and came up with a function that finds a collection that covers m-sets, but then I realized that it is not necessarily the smallest:
cover <- function(sets, elements = NULL) { if (is.null(elements)) { # Build the union of all sets su <- integer() for(si in sets) su <- union(su, si) } else { su <- elements } s <- su for(i in seq_along(s)) { # create set candidate with one element removed sc <- s[-i] ok <- TRUE for(si in sets) { if (!any(match(si, sc, nomatch=0L))) { ok <- FALSE break } } if (ok) { s <- sc } } # The resulting set s } sets <- list(s1=c(1,3,4), s2=c(1,3), s3=c(4)) > cover(sets) # [1] 3 4
Then we can time:
n <- 100 # number of elements m <- 1000 # number of sets sets <- lapply(seq_len(m), function(i) sample.int(n, runif(1, 1, n))) system.time( s <- cover(sets) ) # 0.53 seconds
Not so bad, but still not the smallest.
The obvious solution: generate all the permutations of the elements and go to the coverage function and save the smallest result. This will approach "forever."
Another approach is to create a limited number of random permutations - this way you get an approximation of the smallest set.
ns <- 10 # number of samples elements <- seq_len(n) smin <- sets for(i in seq_len(ns)) { s <- cover(sets, sample(elements)) if (length(s) < length(smin)) { smin <- s } } length(smin) # approximate smallest length
Tommy source share