The practical application of this problem is a group task in the study of psychology, but the theoretical formulation is as follows:
I have a matrix (the actual matrix is 27x72, but I will take a 4x8 example):
1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 1 1 0 0 0 1 0 1
I want to extract half of the rows from this matrix so that the resulting column values are equal (thus effectively creating two matrices with equivalent column sums). I cannot reorder the values inside the rows.
I tried some brute force solutions, but my matrix is too large for it to be effective, even choosing some random constraints first. It seems to me that the search space may be limited by the best algorithm, but so far I have not been able to think of one thing. Any ideas? It is also possible that there is no solution, so the algorithm would have to deal with this. I worked in R, but I could easily switch to python.
Update Found a solution thanks to ljeabmreosn. Karmarkar-Karp did a great job on the algorithm, and the conversion of strings to base 73 was inspired. It was very difficult for me to find a code that would actually give me subsequences, and not just the final difference (maybe most people are only interested in this problem in the abstract?). In any case, it was the code:
First I converted my lines to base 73 as a suggested poster. For this, I used the basic package in python, defining an alphabet with 73 characters, and then using the basein.decode function to convert to decimel.
For the algorithm, I just added code for printing subsequence indices from this mailing list message from Tim Peters: https://mail.python.org/pipermail/tutor/2001-August/008098.html
from __future__ import nested_scopes import sys import bisect class _Num: def __init__(self, value, index): self.value = value self.i = index def __lt__(self, other): return self.value < other.value
This gives the following result:
Set 1: [0, 3, 5, 6, 9, 10, 12, 15, 17, 19, 21, 22, 24, 26, 28, 31, 32, 34, 36, 38, 41, 43, 45, 47, 48, 51, 53, 54, 56, 59, 61, 62, 65, 66, 68, 71] Set 2: [1, 2, 4, 7, 8, 11, 13, 14, 16, 18, 20, 23, 25, 27, 29, 30, 33, 35, 37, 39, 40, 42, 44, 46, 49, 50, 52, 55, 57, 58, 60, 63, 64, 67, 69, 70] Set 1 sum 30309344369339288555041174435706422018348623853211009172L Set 2 sum 30309344369339288555041174435706422018348623853211009172L difference 0L
Which provides indices of its own subsets in a few seconds. Thanks everyone!