Data in a list in a list

Given that:

g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]] 

How can I compare each list in g, so that for lists that can use a common number can be combined into a set?

eg.
0 exists in g[2] and g[4] so they merge with the set {0,2,3,7}

I tried the following, but it does not work:

 for i in g: for j in g: if k in i == l in j: m=set(i+j) 

I want to make the maximum possible set.

+6
source share
3 answers

Here's a quickie that will give a list of all intersecting sets:

 sets = [set(i+j) for i in g for j in g if i!=j and (set(i) & set(j))] 

Note that each result will be repeated, since each list is compared twice, once on the left and once on the right.

+1
source

How much faster . You can first create a list of element sets with len more than one ( s ). then view the list and update it with union !

 s=map(set,g) def find_intersection(m_list): for i,v in enumerate(m_list) : for j,k in enumerate(m_list[i+1:],i+1): if v & k: m_list[i]=v.union(m_list.pop(j)) return find_intersection(m_list) return m_list 

Demo:

 g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]] s=map(set,g) print find_intersection(s) [set([0, 2, 3, 7]), set([1, 4, 5, 6])] g=[[1,2,3],[3,4,5],[5,6],[6,7],[9,10],[10,11]] s=map(set,g) print find_intersection(s) [set([1, 2, 3, 4, 5, 6, 7]), set([9, 10, 11])] g=[[], [1], [0,2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]] s=map(set,g) print find_intersection(s) [set([1, 4, 5, 6]), set([0, 2, 3, 7])] 

Test with @Mark answer:

 from timeit import timeit s1="""g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]] sets = [set(i+j) for i in g for j in g if i!=j and (set(i) & set(j))] """ s2="""g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7]] s=map(set,g) def find_intersection(m_list): for i,v in enumerate(m_list) : for j,k in enumerate(m_list[i+1:],i+1): if v & k: s[i]=v.union(m_list.pop(j)) return find_intersection(m_list) return m_list """ print ' first: ' ,timeit(stmt=s1, number=100000) print 'second : ',timeit(stmt=s2, number=100000) first: 3.8284008503 second : 0.213887929916 
+1
source

If the g or g elements are huge, you can use Disjoint Sets to increase efficiency.

This data structure can be used to classify each element in the set to which it should belong.

The first step is to create a Disjoint Set collection with all g sets labeled with their index in g :

 g=[[], [], [0, 2], [1, 5], [0, 2, 3, 7], [4, 6], [1, 4, 5, 6], [], [], [3, 7],[99]] g = map(set, g) dss = CDisjointSets() for i in xrange(len(g)): dss.MakeSet(i) 

Then the sets are joined whenever the intersection is not empty:

 for i in xrange(len(g)): for j in xrange(i+1, len(g)): if g[i].intersection(g[j]): dss.Join(i,j) 

At this point in dss you will get a generic label for g sets to be combined:

 print(dss) 

parent (0) = 0 parent (1) = 1 parent (2) = 2 parent (3) = 3 parent (4) = 2 parent (5) = 3 parent (6) = 3 parent (7) = 7 parent ( 8) = 8 parent (9) = 2 parent (10) = 10

Now you just need to build new sets connecting those that have the same label:

 l2set = dict() for i in xrange(len(g)): label = dss.FindLabel(i).getLabel() l2set[label] = l2set.get(label, set()).union(g[i]) print(l2set) 

Result:

 {0: set([]), 1: set([]), 2: set([0, 2, 3, 7]), 3: set([1, 4, 5, 6]), 7: set([]), 8: set([]), 10: set([99])} 

This is the implementation of the disjoint sets that I used, but you can certainly find another with better sintax:

 """ Disjoint Sets ------------- Pablo Francisco PΓ©rez Hidalgo December,2012. """ class CDisjointSets: #Class to represent each set class DSet: def __init__(self, label_value): self.__label = label_value self.rank = 1 self.parent = self def getLabel(self): return self.__label #CDisjointSets Private attributes __sets = None #CDisjointSets Constructors and public methods. def __init__(self): self.__sets = {} def MakeSet(self, label): if label in self.__sets: #This check slows the operation a lot, return False #it should be removed if it is sure that #two sets with the same label are not goind #to be created. self.__sets[label] = self.DSet(label) #Pre: 'labelA' and 'labelB' are labels or existing disjoint sets. def Join(self, labelA, labelB): a = self.__sets[labelA] b = self.__sets[labelB] pa = self.Find(a) pb = self.Find(b) if pa == pb: return #They are already joined parent = pa child = pb if pa.rank < pb.rank: parent = pb child = pa child.parent = parent parent.rank = max(parent.rank, child.rank+1) def Find(self,x): if x == x.parent: return x x.parent = self.Find(x.parent) return x.parent def FindLabel(self, label): return self.Find(self.__sets[label]) def __str__(self): ret = "" for e in self.__sets: ret = ret + "parent("+self.__sets[e].getLabel().__str__()+") = "+self.FindLabel(e).parent.getLabel().__str__() + "\n" return ret 
+1
source

Source: https://habr.com/ru/post/980565/


All Articles