Select some of the many binary sequences so that their "or" result is 1111111111 .... 111

I have N binary sequences of length L, where N and L are possibly very large, and these sequences can be very sparse, say, have much more 0s and then 1s.

I want to select from them M sequences, namely b_1, b_2, b_3 ... so that

b_1 | b_2 | b_3 ... | b_M = 1111...11 (L 1s) 

Is there an algorithm to achieve it?

My idea:

STEP1: for a position from 1 to L, count the total number of sequences that have 1 in that position. Call it Owner Number

STEP 2: Consider a position that has a minimum quantity, and select a sequence that has a maximum number of 1s from its own sequence of that position.

STEP3: ignore the selected sequence, update the owner number and return to STEP2.

I believe that my method cannot give a better answer.

Anyone have a better idea?

+4
source share
3 answers

This is a known cover issue . This is NP-hard - in fact, its version of the solution is one of the canonical NP-complete problems and is one of the 21 problems included in the Carp 1972 document - and therefore an effective algorithm is not known for its solution.

The algorithm that you describe in your question is known as the " greedy algorithm " and (if your problem does not have special functions that you are not telling us), this is, in fact, the most famous approach. He finds a collection of collections that is no larger than O (log | N |), multiplied by the size of the smallest such collection.

+8
source

Sounds like a typical backtrack task.

Yes, your algorithm sounds reasonable if you want to quickly get a good answer. If you want to have a combination of the least possible patterns, you cannot do better than try all the combinations.

+2
source

Depending on the exact structure of the problem, there is another method that often works well (and actually gives an optimal result):

Let x[j] be a logical variable representing the choice of whether to include the jth binary sequence in the result. The zero suppressed binary decision diagram can now represent (perhaps briefly, depending on the characteristics of the problem) a family of sets for which the OR of binary sequences corresponding to the variable x[j] included in the set is everything. Finding the smallest such set (thus minimizing the number of sequences included) is relatively easy if ZDD was brief. Details can be found in the "Programming Guide" section of Chapter 7.1.4 (Volume 4A).

It is also easy to adapt to the exact coverage by taking a family of sets, so there is exactly 1 for each position.

+1
source

Source: https://habr.com/ru/post/1433204/


All Articles