Algorithm for selecting n vectors from a set while minimizing the cost

if we have:

  • the set U of n-dimensional vectors (vector v = <x1, x2 ..., xn>)
  • the restriction of the n-dimensional vector c = <x1 ... xn>
  • n-dimensional vector of weights w = x1 ... xn>
  • integer s

I need an algorithm that selects S vectors from U to the set R, minimizing the cost of the function (R)

cost(R) = sum(abs(c-sumVectors(R))*w) 

(sumVectors is a function that sums all vectors like this: sumVectors({< 1,2 >; < 3 ,4>}) = < 4,6 > , and sum(< 1, 2, 3 >) returns scalar 6)

The solution does not have to be optimal. I just need to get the best guess I can get at a given time.

Any idea where to start? (Preferably something faster / smarter than genetic algorithms)

+4
source share
3 answers

This is an optimization problem. Since you do not need an optimal solution, you can try the stochastic optimization method , for example, Hill Climbing , in which you start with a random solution (a random subset of R) and look at a set of neighboring solutions (adding or removing one of the components of the current solution) for those better with a corresponding cost function.

To get a better solution, you can also add Simulated Annealing to your hill search search. The idea is that in some cases it is necessary to go to the worst solution, and then come to the best later. Simulated annealing works better because it allows you to move to a worse solution at the beginning of the process. The algorithm becomes less likely to solve the worst solution as the process continues.

I am inserting some ping code examples to climb the top of the hill to solve your problem here: https://gist.github.com/921f398d61ad351ac3d6

My R code example always stores a list of indexes in U , and I use Euclidean distance to compare the similarity between neighbors. Of course, you can use other distance functions that suit your own needs. Also pay attention to the code, I get neighbors on the fly. If you have a large vector pool in U , you may need to cache precomputed neighbors or even consider localization sensitivity to avoid O(n^2) comparisons. Simulated annealing can be added to the above code.

The result of one random run is shown below.

I use only 20 vectors in U , S = 10, so that I can compare the result with the optimal solution. The process of raising the hill stops at the 4th step, when there is no better choice for moving with the replacement of only one k-nearest neighbors.

I also run an exhaustive search that iterates through all possible combinations. You can see that the result of climbing the hill is pretty good compared to a comprehensive approach. It takes only 4 steps to get a relatively small cost (albeit a minimum minimum), which requires an exhaustive search of more than 82 thousand steps to win.

 initial R [1, 3, 4, 5, 6, 11, 13, 14, 15, 17] hill-climbing cost at step 1: 91784 hill-climbing cost at step 2: 89574 hill-climbing cost at step 3: 88664 hill-climbing cost at step 4: 88503 exhaustive search cost at step 1: 94165 exhaustive search cost at step 2: 93888 exhaustive search cost at step 4: 93656 exhaustive search cost at step 5: 93274 exhaustive search cost at step 10: 92318 exhaustive search cost at step 44: 92089 exhaustive search cost at step 50: 91707 exhaustive search cost at step 84: 91561 exhaustive search cost at step 99: 91329 exhaustive search cost at step 105: 90947 exhaustive search cost at step 235: 90718 exhaustive search cost at step 255: 90357 exhaustive search cost at step 8657: 90271 exhaustive search cost at step 8691: 90129 exhaustive search cost at step 8694: 90048 exhaustive search cost at step 19637: 90021 exhaustive search cost at step 19733: 89854 exhaustive search cost at step 19782: 89622 exhaustive search cost at step 19802: 89261 exhaustive search cost at step 20097: 89032 exhaustive search cost at step 20131: 88890 exhaustive search cost at step 20134: 88809 exhaustive search cost at step 32122: 88804 exhaustive search cost at step 32125: 88723 exhaustive search cost at step 32156: 88581 exhaustive search cost at step 69336: 88506 exhaustive search cost at step 82628: 88420 
+4
source

You will need to check the costs of all possible R sets and minimize them. If you select the step-by-step minimum cost vectors with each addition, you may not find the set with the minimum cost. If the set of U vectors is very large and the calculation is too slow, you may be forced to use the step-by-step method.

+3
source

Your problem is essentially combinatorial optimization. They are difficult to solve, but I can offer a couple of suggestions. They are based on the idea that you cannot explore all combinations, so it is difficult for you to explore the surroundings of avidly optimal solutions.

There is a very general method called beam-search , which is a heuristic method that significantly changes the search for the best search for working with limited memory (beam width). He relies on the fact that for any given partial solution, you can calculate the grade associated with adding some new member to the set, as well as the grade for the current set (since you have an objective function that is beautiful). The idea is that you start with an empty set and constantly select the n best next states for each state on the stack, and then when all are expanded, you throw away all but the n best states on the stack and repeat. This will give you possible solutions, and you can choose the highest result.

However, this may not work, since the information about your objective function will immediately make these selection vectors close to restriction, and then (after a certain number of steps depending on the relative scales of your cost vectors and component vectors) for small vectors to reduce the difference. If so, you can use the solution of this method to initialize a random walk / simulated annealing strategy (allowing you to randomly add or remove from the set) to look for the best solutions that are close to the solution obtained when searching for a beam.

+1
source

Source: https://habr.com/ru/post/1440516/


All Articles