This is an optimization problem. Since you do not need an optimal solution, you can try the stochastic optimization method , for example, Hill Climbing , in which you start with a random solution (a random subset of R) and look at a set of neighboring solutions (adding or removing one of the components of the current solution) for those better with a corresponding cost function.
To get a better solution, you can also add Simulated Annealing to your hill search search. The idea is that in some cases it is necessary to go to the worst solution, and then come to the best later. Simulated annealing works better because it allows you to move to a worse solution at the beginning of the process. The algorithm becomes less likely to solve the worst solution as the process continues.
I am inserting some ping code examples to climb the top of the hill to solve your problem here: https://gist.github.com/921f398d61ad351ac3d6
My R code example always stores a list of indexes in U , and I use Euclidean distance to compare the similarity between neighbors. Of course, you can use other distance functions that suit your own needs. Also pay attention to the code, I get neighbors on the fly. If you have a large vector pool in U , you may need to cache precomputed neighbors or even consider localization sensitivity to avoid O(n^2) comparisons. Simulated annealing can be added to the above code.
The result of one random run is shown below.
I use only 20 vectors in U , S = 10, so that I can compare the result with the optimal solution. The process of raising the hill stops at the 4th step, when there is no better choice for moving with the replacement of only one k-nearest neighbors.
I also run an exhaustive search that iterates through all possible combinations. You can see that the result of climbing the hill is pretty good compared to a comprehensive approach. It takes only 4 steps to get a relatively small cost (albeit a minimum minimum), which requires an exhaustive search of more than 82 thousand steps to win.
initial R [1, 3, 4, 5, 6, 11, 13, 14, 15, 17] hill-climbing cost at step 1: 91784 hill-climbing cost at step 2: 89574 hill-climbing cost at step 3: 88664 hill-climbing cost at step 4: 88503 exhaustive search cost at step 1: 94165 exhaustive search cost at step 2: 93888 exhaustive search cost at step 4: 93656 exhaustive search cost at step 5: 93274 exhaustive search cost at step 10: 92318 exhaustive search cost at step 44: 92089 exhaustive search cost at step 50: 91707 exhaustive search cost at step 84: 91561 exhaustive search cost at step 99: 91329 exhaustive search cost at step 105: 90947 exhaustive search cost at step 235: 90718 exhaustive search cost at step 255: 90357 exhaustive search cost at step 8657: 90271 exhaustive search cost at step 8691: 90129 exhaustive search cost at step 8694: 90048 exhaustive search cost at step 19637: 90021 exhaustive search cost at step 19733: 89854 exhaustive search cost at step 19782: 89622 exhaustive search cost at step 19802: 89261 exhaustive search cost at step 20097: 89032 exhaustive search cost at step 20131: 88890 exhaustive search cost at step 20134: 88809 exhaustive search cost at step 32122: 88804 exhaustive search cost at step 32125: 88723 exhaustive search cost at step 32156: 88581 exhaustive search cost at step 69336: 88506 exhaustive search cost at step 82628: 88420