How to implement a simple multiset greedy algorithm in python

I would like to implement the following algorithm. For n and k consider all combinations with repetitions in sorted order, where we select k numbers from {0,..n-1} with repetitions. For example, if n=5 and k =3 , we have:

(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4), (0, 1, 1), (0 , 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 2), (0, 2, 3), (0, 2, 4), (0, 3 , 3), (0, 3, 4), (0, 4, 4), (1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4 ), (1, 2, 2), (1, 2, 3), (1, 2, 4), (1, 3, 3), (1, 3, 4), (1, 4, 4), (2, 2, 2), (2, 2, 3), (2, 2, 4), (2, 3, 3), (2, 3, 4), (2, 4, 4), (3 , 3, 3), (3, 3, 4), (3, 4, 4), (4, 4, 4)]

Now I will consider each combination as a multiset. I want to go through these multisets eagerly and break the list. A section has the property that the intersection size of all multisets inside it must be at least k-1 . So, in this case we have:

 (0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4) 

then

  (0, 1, 1), (0, 1, 2), (0, 1, 3), (0, 1, 4) 

then

 (0, 2, 2), (0, 2, 3), (0, 2, 4) 

then

 (0, 3, 3), (0, 3, 4) 

then

 (0, 4, 4) 

etc.

In python, you can iterate over combinations like this:

 import itertools for multiset in itertools.combinations_with_replacement(range(5),3): #Greedy algo 

How can I create these sections?

One of the problems that I have encountered is calculating the intersection size of multisets. For example, the intersection of multisets (2,1,2) and (3,2,2) has size 2.


Here is the complete answer for n=4, k=4 .

 (0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 0, 2), (0, 0, 0, 3) (0, 0, 1, 1), (0, 0, 1, 2), (0, 0, 1, 3) (0, 0, 2, 2), (0, 0, 2, 3) (0, 0, 3, 3) (0, 1, 1, 1), (0, 1, 1, 2), (0, 1, 1, 3) (0, 1, 2, 2), (0, 1, 2, 3) (0, 1, 3, 3) (0, 2, 2, 2), (0, 2, 2, 3) (0, 2, 3, 3), (0, 3, 3, 3) (1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 3) (1, 1, 2, 2), (1, 1, 2, 3) (1, 1, 3, 3) (1, 2, 2, 2), (1, 2, 2, 3) (1, 2, 3, 3), (1, 3, 3, 3) (2, 2, 2, 2), (2, 2, 2, 3) (2, 2, 3, 3), (2, 3, 3, 3) (3, 3, 3, 3) 
+5
source share
2 answers

One way to create partitions is to iterate over your iterator, and then compare each multiset * with the previous one. I tested 4 ways ** to compare multipliers, and the fastest I found was to check the in membership of the iterator of the previous multiset, which is consumed, and short circuits after completing the membership test. If the number of identical elements in the multiset and the previous multiset is equal to the length of the multiset minus 1, then the criteria for their grouping are satisfied. Then, the resulting list output generator is created, where you append elements that match the criteria of the previous list , and start a new list containing tuple otherwise, yield time to minimize memory usage:

 import itertools def f(n,k): prev, group = None, [] for multiset in itertools.combinations_with_replacement(range(n),k): if prev: it = iter(prev) for idx, item in enumerate(multiset): if item not in it: break if idx == len(multiset) - 1: group.append(multiset) continue if group: yield group group = [multiset] prev = multiset yield group 

Testing

Input:

 for item in f(4,4): print(item) 

Output:

 [(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 0, 2), (0, 0, 0, 3)] [(0, 0, 1, 1), (0, 0, 1, 2), (0, 0, 1, 3)] [(0, 0, 2, 2), (0, 0, 2, 3)] [(0, 0, 3, 3)] [(0, 1, 1, 1), (0, 1, 1, 2), (0, 1, 1, 3)] [(0, 1, 2, 2), (0, 1, 2, 3)] [(0, 1, 3, 3)] [(0, 2, 2, 2), (0, 2, 2, 3)] [(0, 2, 3, 3), (0, 3, 3, 3)] [(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 3)] [(1, 1, 2, 2), (1, 1, 2, 3)] [(1, 1, 3, 3)] [(1, 2, 2, 2), (1, 2, 2, 3)] [(1, 2, 3, 3), (1, 3, 3, 3)] [(2, 2, 2, 2), (2, 2, 2, 3)] [(2, 2, 3, 3), (2, 3, 3, 3)] [(3, 3, 3, 3)] 

Input:

 for item in f(5,3): print(item) 

Output:

 [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4)] [(0, 1, 1), (0, 1, 2), (0, 1, 3), (0, 1, 4)] [(0, 2, 2), (0, 2, 3), (0, 2, 4)] [(0, 3, 3), (0, 3, 4)] [(0, 4, 4)] [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4)] [(1, 2, 2), (1, 2, 3), (1, 2, 4)] [(1, 3, 3), (1, 3, 4)] [(1, 4, 4)] [(2, 2, 2), (2, 2, 3), (2, 2, 4)] [(2, 3, 3), (2, 3, 4)] [(2, 4, 4)] [(3, 3, 3), (3, 3, 4)] [(3, 4, 4), (4, 4, 4)] 

* I call them multisets according to your terminology, but they are actually tuple (ordered and immutable data structures); using the collections.Counter object, for example Counter((0, 0, 0, 1)) return Counter({0: 3, 1: 1}) , and the decrement will look like a true multi-position approach, but I found it to be slower, because using order is really useful.

** Other slower functions that give the same result as I tested:

 def f2(n,k): prev, group = None, [] for multiset in itertools.combinations_with_replacement(range(n),k): if prev: if sum(item1 == item2 for item1, item2 in zip(prev,multiset)) == len(multiset) - 1: group.append(multiset) continue if group: yield group group = [multiset] prev = multiset yield group def f3(n,k): prev, group = None, [] for multiset in itertools.combinations_with_replacement(range(n),k): if prev: lst = list(prev) for item in multiset: if item in lst: lst.remove(item) else: break if len(multiset) - len(lst) == len(multiset) - 1: group.append(multiset) continue if group: yield group group = [multiset] prev = multiset yield group import collections def f4(n,k): prev, group = None, [] for multiset in itertools.combinations_with_replacement(range(n),k): if prev: if sum((collections.Counter(prev) - collections.Counter(multiset)).values()) == 1: group.append(multiset) continue if group: yield group group = [multiset] prev = multiset yield group 

Timing example:

 from timeit import timeit list(f(11,10)) == list(f2(11,10)) == list(f3(11,10)) == list(f4(11,10)) # True timeit(lambda: list(f(11,10)), number = 10) # 4.19157001003623 timeit(lambda: list(f2(11,10)), number = 10) # 7.32002648897469 timeit(lambda: list(f3(11,10)), number = 10) # 6.236868146806955 timeit(lambda: list(f4(11,10)), number = 10) # 47.20136355608702 

Note that all approaches become slow with large values ​​of n and k due to the large number of generated combinations.

+3
source

We can view the tuples in the set / list that you want to split as numbers of length k with base n. Considered as a number, your algorithm is greedy on the first minimum number. Let the set of all numbers with k "digits" and base n denote N (k, n). Ignoring the fact that N (k, n) is not exactly the list that you want to split at the moment, we can split N (k, n) into split criteria, greedily on the smallest first base is pretty trivial; by counting from 0 (for example, 00000 in the case of k = 5, for example) and creating a new section each time a transfer occurs during the counting (i.e., overflow from the digit i to the digit i + 1). That is, the rule: carry <=> new_partition.

Evidence. Suppose that A is the value after the transfer, and the transfer was included in the i-th digit. The promotion has a common prefix with all numbers in the previous section before the transfer to, but excluding the i-th one, and therefore at least 1 is different. Only one has a suffix after I with one other previous (lower) number, but this number is already in the section with other numbers differing by more than 1 from A, so A starts a new section.

However, according to your specification, we consider only a subset of N(k,n) ; X , where for any x in X, x [i] <= x [j] for i> j. This adds a little complication to the above rule <=> of the new section. Now:

  • new_partition => transfer
  • but porting doesn't necessarily mean new_partition

There is only one condition under which the transfer does not imply new_partition: there has just been a transfer creating a new partition, then there is another transfer called x [i] <= x [j] when i> j is corrected. The following transfer does not cause more than one change, so it does not imply a new section.

Implementation:


 class ExpNum: ''' Represents a number with base @base, @size digits, and funny successor semantics. ''' def __init__(self, base, size): if size <= 0 or base <= 1: raise Exception("Bad args") self.size = size self.base = base self.number = [0]*size self.zero = [0]*size def increment(self): ''' Increment number by one. If we carry return index of carry else return -1. ''' carried = -1 for i in reversed(range(0, len(self.number))): self.number[i] = (self.number[i]+1)%self.base if self.number[i] != 0: break carried = i if carried >= 0: self.pullup() return carried def pullup(self): ''' Ensure x[i] <= x[j] when i > j ''' for i in range(0, len(self.number)): if self.number[i] == 0 and i > 0: self.number[i] = self.number[i-1] def out_by_one_partition(self): ''' Do the partition by counting from 0 to n**k ''' self.number = [0]*self.size just_carried = False partition = [list(self.number)] carried = self.increment() while self.number != self.zero: # Check for exception to carry => new partition. if carried >= 0 and not (just_carried and list(self.number)[carried] == (self.base -1) and len(partition) == 1): yield(partition) partition = [] partition += [list(self.number)] just_carried = carried >= 0 carried = self.increment() yield(partition) 

Test:

 from ExpNum import ExpNum from timeit import timeit from pprint import pprint pprint(list(ExpNum(4,4).out_by_one_partition())) print(timeit(lambda: list(ExpNum(11,10).out_by_one_partition()), number = 10)) 

Test result:

 [[[0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 2], [0, 0, 0, 3]], [[0, 0, 1, 1], [0, 0, 1, 2], [0, 0, 1, 3]], [[0, 0, 2, 2], [0, 0, 2, 3]], [[0, 0, 3, 3]], [[0, 1, 1, 1], [0, 1, 1, 2], [0, 1, 1, 3]], [[0, 1, 2, 2], [0, 1, 2, 3]], [[0, 1, 3, 3]], [[0, 2, 2, 2], [0, 2, 2, 3]], [[0, 2, 3, 3], [0, 3, 3, 3]], [[1, 1, 1, 1], [1, 1, 1, 2], [1, 1, 1, 3]], [[1, 1, 2, 2], [1, 1, 2, 3]], [[1, 1, 3, 3]], [[1, 2, 2, 2], [1, 2, 2, 3]], [[1, 2, 3, 3], [1, 3, 3, 3]], [[2, 2, 2, 2], [2, 2, 2, 3]], [[2, 2, 3, 3], [2, 3, 3, 3]], [[3, 3, 3, 3]]] 10.25355386902811 
+1
source

Source: https://habr.com/ru/post/1266769/


All Articles