Algorithm for finding the best combination of elements under certain restrictions

I will try to explain the problem in mathematical language.
Suppose I have a set of elements X = {x_1, x_2, ..., x_n} . Each element of X belongs to one of the sets S_1, S_2, ..., S_5 . I consider all subsets of X consisting of 5 elements: {x_i1, x_i2, ..., xi5} so x_i1 belongs to S_1 , ..., x_i5 corresponds to S_5 .
Some subsets are considered correct, and some are considered incorrect. A subset is considered correct if it does not contain conflicting elements. I have a function f1 to determine if a pair of elements is conflicting or not.
I also have a function f2 that can compare such correct subsets and say which subset is better (they can also be equal).
I need to find the best non-conflict subsets (s).

Algo I used:
I built all the subsets, discarded the wrong subsets. Then I sorted the correct subsets using f2 as the sort function, and took the first best subsets (I used the quick sort algorithm). Since there were a huge number of subsets, this procedure did not take enough time.

Is there a better approach in terms of time consumption?

UPDATED
Let's think about x_i as if it is an interval with integer endpoints. f1 returns true if the two intervals do not intersect and false otherwise. f2 compares the sums of the lengths of the intervals in the subsets.

+3
source share
7 answers

This problem is a variant of the maximum weighted interval planning algorithm. The DP algorithm has polynomial complexity O(N*log(N)) with the space O(N) for the naive problem and O(2^G * N * logn(N)) complexity with the space O(2^G * N) for this variation problems, where G , N represents the total number of groups / subsets (5 here) and intervals, respectively.

If x_i does not represent intervals, then the problem is in NP, as other solutions have proved.

Let me first explain the dynamic programming solution for planning the maximum weighted interval, and then solve the change problem.

  • We are given the starting and ending points of the intervals. Let start(i) , end(i) , weight(i) be the start, end point, the length of the interval interval i respectively.
  • Sort intervals based on ascending start point order.
  • Let the sorted order of intervals 1, 2, ... N
  • Let next(i) represent the next interval that does not overlap with interval i .
  • Allows you to define subtask S(i) as the maximum weighted interval only taking into account tasks i, i+1, ... N
  • S(1) is a solution that considers all tasks from 1,2,... N and returns the maximum weighted interval.
  • Now let's define S(i) recursively.

.

 S(i) = weight(i) if(i==N) // last job = max(weight(i)+S(next(i)), S(i+1) 

The complexity of this solution is O(N*log(N) + N) . N*log(N) to find next(i) for all tasks and N to solve subtasks. Space O(N) for saving solutions to subtasks.

Now let's resolve the change to this problem.

  • Let's look at all the intervals in X. Each interval belongs to one of the sets S_1, ... S_5.
  • Let start(i) , end(i) , weight(i) , subset(i) be the start, end point, interval length, subset of interval i respectively.
  • Sort intervals based on ascending start point order.
  • Let the sorted order of intervals 1, 2, ... N
  • Let next(i) represent the next interval that does not overlap with interval i .
  • Allows you to define subtask S(i, pending) as the maximum weighted interval only taking into account tasks i, i+1, ... N , and pending is a list of subsets from which we must choose one interval each.
  • S(1, {S_1,...S_5}) is a solution that considers all tasks 1,...N , selects one interval for each of S_1,...S_5 and returns the maximum weighted interval.
  • Now let's recursively define S(i) as follows.

.

 S(i, pending) = 0 if(pending==empty_set) // possible combination = -inf if(i==N && pending!={group(i)}) // incorrect combination = S(i+1, pending) if(group(i) not element of pending) = max(weight(i)+S(next(i), pending-group(i)), S(i+1, pending) 

Please note that I may have missed some basic cases.

The complexity of this algorithm is O(2^G * N * logn(N)) with the space O(2^G * N) . 2^G * N represents the size of the subtask.

As an estimate, for small values ​​of G<=10 and high values ​​of N>=100000 this algorithm works quite quickly. For average values G>=20 , N<=10000 must also be low for this algorithm to converge. And for large values G>=40 , the algorithm does not converge.

+1
source

Without further qualification of domains and an evaluation function, this problem can be easily proved to be NP-complete if SAT is imposed on it (i.e., let S_1, ..., S_5 be (true, false) and f2 = 1 if the formula is filled and 0 if not). Therefore, in this case, even without taking into account f1, you are out of luck.

If you know more about the actual structure of f1 and f1, you might be more fortunate. See Downtime Satisfaction Problems for what to look for in f1 and f2.

+1
source

Let's think about x_i as if it were an interval with integer endpoints. f1 returns true if the two intervals do not intersect and false otherwise. f2 compares the sums of the lengths of the intervals in the subsets.

If I understand correctly, this means that we can assign a value (its length) to each x_i from X. Then there is no need to evaluate f2 for each possible solution / subset.

It is very unlikely that the smallest 5 x_i forms the best subset. Depending on the evidence, the best subset may be the 5 largest intervals. Therefore, I suggest sorting X by value. The general idea is to start with the highest x and try to add more x (the highest first) until you get 5 non-overlapping ones. Most likely, you will find the best subset before even generating a fraction of all the possible subsets (depending on the specific problem, of course). But in the worst case, this is not faster than your decision.

+1
source

If we set aside the condition to take one x from each S_i, this problem is equivalent to the maximum weight, an independent set in the graph interval (i.e., finding the set of maximum weight of pairwise unconnected vertices in the graph, where the vertices represent the intervals and the vertices are connected if corresponding intervals overlap). This problem can be solved in polynomial time. The version here also has a color for each vertex, and the selected vertices must have different colors. I'm not sure how to solve this in polynomial time, but you can use the fact that there are not so many colors: make a dynamic programming table T [C, x], where C is the set of colors and x is the position of the endpoint of the interval. T [C, x] should contain the maximum weight you can get from | C | intervals with colors in C to the left of x. Then you can fill out the table from left to right. This should be feasible since there are only 2 ^ 5 = 32 sets of colors.

+1
source

I have a solution that should be good if my understanding of your question is correct: Therefore, I begin by understanding

 each Integer is actually an interval from I1 to I2 and a Set is a combination of such intervals. A Set is correct if none of the intervals are intersecting and Set1>Set2 if the sum of Intervals in S1> sum of Intervals in S2. 

So what I would do in this situation would be something in these lines.

  • When comparing intervals, to determine if they intersect, do this.

    a) Sorting intervals in order of starting points

    b) compare the end point of the first and the starting point of consecutive intervals to determine the overlap. Store an integer named gap, and if the beginning and end of two intervals do not overlap the difference in increment with their difference.

This will automatically give you the sum of the intervals in the set by doing Endpoint (lastI) -Startpoint (firstI) - Gap.

=> If you need only the best, you can take one max variable and continue comparing sets as they appear.

=> If you need top5 or something, then follow below, otherwise skip.

  • As soon as you receive the amount and the set is correct, add the amount to the "MinHeap" of 5 elements. The first 5 elements will go as they are. Basically you track the top 5 items. When the new set is less than the minus of the heap "Do nothing and ignore this set because it is smaller than the top 5 sets", when the set is more than min (which means that it is in the top 5), replace min and sift the element down. holding mines from above 5 from above. It will always contain the top 5 items in the heap.

  • Now that you have the 5 best elements, you can easily determine the best with 5 pop-ups. :)

Note. If the intervals are random, this will lead you to a solution of O (n ^ 2), and then each comparison will have 4 statements to check the positions of the overlap. you can sort the intervals in O (nlogn) and then go through the list once to determine the overlap (nlogn + n = nlogn), while getting the 5 best sets. This should improve your productivity and time.

.

+1
source

I have no answer because you asked a very abstract question, but I will give you an idea.

Try thinking multiThreading. For example, you can create a thread pool with a limited number of threads. Then find a solution for recursion and run a new task for each loop when you dive inside.

I say how you can divide this problem into many small tasks, as your algorithm will be better.

Think problematically not mathematically!

0
source

Consider using a lookup table to optimize f1 time. Consider inserting the subsets that you open into the sorted merge list, rather than quicksorting at the end. If the domain is small and finite, you can implement very fast merge sorts by filling in sparse arrays.

0
source

Source: https://habr.com/ru/post/1392360/


All Articles