I am also sure that the problem is NP-Hard, the most similar problem I found is k-Center Problem . If runtime is more important than correctness, a greedy algorithm is probably your best bet:
Q ={} while |Q| < k Q += p from S where mindist(p, Q) is maximal
Note: for similar problems, for example, a problem with the cover , you can show that the solution from the greedy algorithm is at least 63% and the optimal solution.
To speed things up, I see 3 possibilities:
Index your dataset in R-Tree first , then do a greedy search. Building an R-tree is O (n log n), but although it is designed to find the closest neighbor, it can also help you find the farthest point for the set of points in O (log n). It can be faster than the naive O (k * n) algorithm.
Choose a subset of your 16 million points and execute the greedy algorithm on the subset. You get close anyway so you can save a little more accuracy. You can also combine this with algorithm 1.
Use an iterative approach and stop when you don't have time. The idea here is to randomly select k points from S (allows us to call this set Q '). Then at each step you switch the point p_ from Q ', which has a minimum distance to another in Q' with a random point from S. If the resulting set Q '' is better to continue Q '', otherwise repeat with Q ', so as not to get stuck , you can choose a different point from Q 'than p_ if you could not find an adequate replacement for several iterations.
source share