Multiple Subset Count

I have 2 sets, the set A contains many random numbers, and the set B of elements is the sum of the set A of the subsets.

For instance,

A = [8, 9, 15, 15, 33, 36, 39, 45, 46, 60, 68, 73, 80, 92, 96] B = [183, 36, 231, 128, 137] 

I want to find what number is the sum of the subset with such data.

 S = [[45, 46, 92], [36], [8, 15, 39, 73, 96], [60, 68], [9, 15, 33, 80]] 

I was able to write really dumb brute force code using python.

 class SolvedException(BaseException): pass def solve(sums, nums, answer): num = nums[-1] for i in range(0, len(sums)): sumi = sums[i] if sumi == 0: continue elif sumi - num < 0: continue answer[i].append(num) sums[i] = sumi - num if len(nums) != 1: solve(sums, nums[:-1], answer) elif sumi - num == 0: raise SolvedException(answer) sums[i] = sumi answer[i].pop() try: solve(B, A, [list() for i in range(0, len(B))]) except SolvedException as e: print e.args[0] 

This code is very suitable for small data, but it will take about a billion years (which has 71 numbers and 10 sums) to calculate my data.

I could use some of the best algorithms or optimizations.

Sorry for my bad english and terrible inefficient code.


Edit: Sorry, I realized that I did not describe the problem exactly.

Since each individual element in A used to create elements of B, sum(A) == sum(B)

In addition, set S must be a section of set A

+5
source share
2 answers

This is called the subset sum problem and is a well-known NP-complete problem. Thus, in principle, there is no effective solution. See for example https://en.wikipedia.org/wiki/Subset_sum_problem

However, if your number N is not too large, there are pseudopolynomial algorithms that use dynamic programming: you read list A from left to right and save a list of sums that are feasible and less than N. If you know a number that can be made for a given A, you can it’s easy to get the ones that can be done for A + [a]. Hence the dynamic programming. Usually this will be fast enough for a problem with the size you gave there.

Here is a quick Python solution:

 def subsetsum(A, N): res = {0 : []} for i in A: newres = dict(res) for v, l in res.items(): if v+i < N: newres[v+i] = l+[i] elif v+i == N: return l+[i] res = newres return None 

Then

 >>> A = [8, 9, 15, 15, 33, 36, 39, 45, 46, 60, 68, 73, 80, 92, 96] >>> subsetsum(A, 183) [15, 15, 33, 36, 39, 45] 

After editing the OP:

Now I understand your problem correctly, I will still think that your problem can be solved effectively if you have an effective subset resolver: I would use the divide and conquer solution on B:

  • cut B into two approximately equal parts B1 and B2
  • use your solver of the subset of sums to search among A for all subsets of S whose sum is equal to the sum (B1).
  • for each such S:
    • call recursively solves (S, B1) and solves (A - S, B2)
    • If you both have a solution

However, your (71, 10) problem below is not available for the dynamic programming solution I proposed.


By the way, here is a quick solution to your problem, not using divide and conquer, but which contains the correct adaptation of my dynamic solver to get all the solutions:

 class NotFound(BaseException): pass from collections import defaultdict def subset_all_sums(A, N): res = defaultdict(set, {0 : {()}}) for nn, i in enumerate(A): # perform a deep copy of res newres = defaultdict(set) for v, l in res.items(): newres[v] |= set(l) for v, l in res.items(): if v+i <= N: for s in l: newres[v+i].add(s+(i,)) res = newres return res[N] def list_difference(l1, l2): ## Similar to merge. res = [] i1 = 0; i2 = 0 while i1 < len(l1) and i2 < len(l2): if l1[i1] == l2[i2]: i1 += 1 i2 += 1 elif l1[i1] < l2[i2]: res.append(l1[i1]) i1 += 1 else: raise NotFound while i1 < len(l1): res.append(l1[i1]) i1 += 1 return res def solve(A, B): assert sum(A) == sum(B) if not B: return [[]] res = [] ss = subset_all_sums(A, B[0]) for s in ss: rem = list_difference(A, s) for sol in solve(rem, B[1:]): res.append([s]+sol) return res 

Then:

 >>> solve(A, B) [[(15, 33, 39, 96), (36,), (8, 15, 60, 68, 80), (9, 46, 73), (45, 92)], [(15, 33, 39, 96), (36,), (8, 9, 15, 46, 73, 80), (60, 68), (45, 92)], [(8, 15, 15, 33, 39, 73), (36,), (9, 46, 80, 96), (60, 68), (45, 92)], [(15, 15, 73, 80), (36,), (8, 9, 33, 39, 46, 96), (60, 68), (45, 92)], [(15, 15, 73, 80), (36,), (9, 39, 45, 46, 92), (60, 68), (8, 33, 96)], [(8, 33, 46, 96), (36,), (9, 15, 15, 39, 73, 80), (60, 68), (45, 92)], [(8, 33, 46, 96), (36,), (15, 15, 60, 68, 73), (9, 39, 80), (45, 92)], [(9, 15, 33, 46, 80), (36,), (8, 15, 39, 73, 96), (60, 68), (45, 92)], [(45, 46, 92), (36,), (8, 15, 39, 73, 96), (60, 68), (9, 15, 33, 80)], [(45, 46, 92), (36,), (8, 15, 39, 73, 96), (15, 33, 80), (9, 60, 68)], [(45, 46, 92), (36,), (15, 15, 60, 68, 73), (9, 39, 80), (8, 33, 96)], [(45, 46, 92), (36,), (9, 15, 15, 39, 73, 80), (60, 68), (8, 33, 96)], [(9, 46, 60, 68), (36,), (8, 15, 39, 73, 96), (15, 33, 80), (45, 92)]] >>> %timeit solve(A, B) 100 loops, best of 3: 10.5 ms per loop 

So this problem size is pretty fast, although note is optimized here.

+9
source

A complete solution that calculates the whole way to make a conclusion. I use ints as characteristic sets for speed and memory usage: 19='0b10011' represents [A[0],A[1],A[4]]=[8,9,33] here.

 A = [8, 9, 15, 15, 33, 36, 39, 45, 46, 60, 68, 73, 80, 92, 96] B =[183, 36, 231, 128, 137] def subsetsum(A,N): res=[[0]]+[[] for i in range(N)] for i,a in enumerate(A): k=1<<i stop=[len(l) for l in res] for shift,l in enumerate(res[:N+1-a]): n=a+shift ln=res[n] for s in l[:stop[shift]]: ln.append(s+k) return res res = subsetsum(A,max(B)) solB = [res[b] for b in B] exactsol = ~-(1<<len(A)) def decode(answer): return [[A[i] for i,b in enumerate(bin(sol)[::-1]) if b=='1'] for sol in answer] def solve(i,currentsol,answer): if currentsol==exactsol : print(decode(answer)) if i==len(B): return for sol in solB[i]: if not currentsol&sol: answer.append(sol) solve(i+1,currentsol+sol,answer) answer.pop() 

For:

 solve(0,0,[]) [[9, 46, 60, 68], [36], [8, 15, 39, 73, 96], [15, 33, 80], [45, 92]] [[9, 46, 60, 68], [36], [8, 15, 39, 73, 96], [15, 33, 80], [45, 92]] [[8, 15, 15, 33, 39, 73], [36], [9, 46, 80, 96], [60, 68], [45, 92]] [[9, 15, 33, 46, 80], [36], [8, 15, 39, 73, 96], [60, 68], [45, 92]] [[9, 15, 33, 46, 80], [36], [8, 15, 39, 73, 96], [60, 68], [45, 92]] [[15, 15, 73, 80], [36], [9, 39, 45, 46, 92], [60, 68], [8, 33, 96]] [[15, 15, 73, 80], [36], [8, 9, 33, 39, 46, 96], [60, 68], [45, 92]] [[45, 46, 92], [36], [15, 15, 60, 68, 73], [9, 39, 80], [8, 33, 96]] [[45, 46, 92], [36], [9, 15, 15, 39, 73, 80], [60, 68], [8, 33, 96]] [[45, 46, 92], [36], [8, 15, 39, 73, 96], [60, 68], [9, 15, 33, 80]] [[45, 46, 92], [36], [8, 15, 39, 73, 96], [15, 33, 80], [9, 60, 68]] [[45, 46, 92], [36], [8, 15, 39, 73, 96], [60, 68], [9, 15, 33, 80]] [[45, 46, 92], [36], [8, 15, 39, 73, 96], [15, 33, 80], [9, 60, 68]] [[15, 33, 39, 96], [36], [8, 15, 60, 68, 80], [9, 46, 73], [45, 92]] [[15, 33, 39, 96], [36], [8, 9, 15, 46, 73, 80], [60, 68], [45, 92]] [[15, 33, 39, 96], [36], [8, 15, 60, 68, 80], [9, 46, 73], [45, 92]] [[15, 33, 39, 96], [36], [8, 9, 15, 46, 73, 80], [60, 68], [45, 92]] [[8, 33, 46, 96], [36], [15, 15, 60, 68, 73], [9, 39, 80], [45, 92]] [[8, 33, 46, 96], [36], [9, 15, 15, 39, 73, 80], [60, 68], [45, 92]] 

Note that if two 15 not in the same subset, the solution doubles.

Solves the unique problem of solving:

 A=[1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049] B=[5010, 5035, 5060, 5085, 5110, 5135, 5160, 5185, 5210, 5235] 

in one second. Unfortunately, it is not yet optimized enough for the problem (71,10).

Another one in the spirit of pure dynamic programming:

 @functools.lru_cache(max(B)) def solutions(n): if n==0 : return set({frozenset()}) #{{}} if n<0 : return set() sols=set() for i,a in enumerate(A): for s in solutions(na): if i not in s : sols.add(s|{i}) return sols def decode(answer): return([[A[i] for i in sol] for sol in answer]) def solve(B=B,currentsol=set(),answer=[]): if len(currentsol)==len(A) : sols.append(decode(answer)) if B: for sol in solutions(B[0]): if set.isdisjoint(currentsol,sol): solve(B[1:],currentsol|sol,answer+[sol]) sols=[];solve() 
+1
source

Source: https://habr.com/ru/post/1264644/


All Articles