Below is the itertools approach for the first part of what you call Iteration 1 . If it is possible to run it at all, you can run it several times in a loop, deleting the elements found at each stage. As I pointed out in the comments, this is only possible with small n :
import itertools def intersect(sets): #forms the intersection of all s in sets #assumes that there is at least one I = sets[0].copy() for s in sets[1:]: I = I & s return I def largestIntersection(sets,n): maxSize = 0 maxSets = [set()] for groupCombo in itertools.combinations(sets,n): I = intersect(groupCombo) if len(I) > maxSize: maxSize = len(I) maxSets = [I] elif len(I) == maxSize: maxSets.append(I) return maxSets
For instance:
>>> groups = [ [1], [1, 2 ,3], [1, 2, 3, 4, 5], [1, 2, 3 ,4], [2, 3], [4, 5, 6], [4, 5, 7] ] >>> groups = [set(group) for group in groups] >>> largestIntersection(groups,3) [{1, 2, 3}]
In the editor:. The following modification may help - depending on the distribution of numbers in the groups and the size of the groups:
def guessSize(sets,n,trials): return max(len(intersect(random.sample(sets,n))) for trial in range(trials)) def maxIntersection(sets,n,trials = 100000): k = guessSize(sets,n,trials) filteredSets = [s for s in sets if len(s) >= k] return largestIntersection(filteredSets,n)
The idea is to first reduce the number of groups before trying to iterate over intersections. For instance:
#stress test: import random nums = list(range(1,101)) testGroups = [set(random.sample(nums,random.randint(1,100))) for n in range(1000)] foundGroups = maxIntersection(testGroups,3)
it only takes a few seconds to calculate foundGroups , not a few minutes if I used the largestIntersection(testGroups) directly. On the other hand, with various variants of random parameters, the time savings become negligible.
Further editing: Perhaps you can be more aggressive with filtering:
import collections def maxIntersection(sets,n,trials = 100000): k = guessSize(sets,n,trials) filteredSets = [s for s in sets if len(s) >= k] counts = collections.Counter() for s in filteredSets: for x in s: counts[x] +=1 t = {x for x,c in counts.items() if c >= k} filteredSets = [s & t for s in filteredSets if len(s & t) >= k] return largestIntersection(filteredSets,n)