Select a subset of the farthest points in a given set of points

Question

Select a subset of the farthest points in a given set of points

Imagine you are given a set S of n points in three dimensions. The distance between any two points is a simple Euclidean distance. You want to select a subset of Q from k points from this set so that they are farthest from each other. In other words, there is no other subset Q of k points such that min of all pairwise distances in Q is less than in Q.

If n is about 16 million and k is about 300, how do we effectively do this?

I guess this NP is hard, maybe we just want to focus on the approximation. One idea that I can come up with is to use multidimensional scaling to sort these points in a row, and then use the binary search version to get the points farthest from each other on this line.

+5

python algorithm computational-geometry dimensionality-reduction

Shitalshah Feb 22 '18 at 10:29

source share

2 answers

Saibot · Answer 1 · 2018-02-22T11:48:29+0000

I am also sure that the problem is NP-Hard, the most similar problem I found is k-Center Problem . If runtime is more important than correctness, a greedy algorithm is probably your best bet:

Q ={} while |Q| < k Q += p from S where mindist(p, Q) is maximal

Note: for similar problems, for example, a problem with the cover , you can show that the solution from the greedy algorithm is at least 63% and the optimal solution.

To speed things up, I see 3 possibilities:

Index your dataset in R-Tree first , then do a greedy search. Building an R-tree is O (n log n), but although it is designed to find the closest neighbor, it can also help you find the farthest point for the set of points in O (log n). It can be faster than the naive O (k * n) algorithm.
Choose a subset of your 16 million points and execute the greedy algorithm on the subset. You get close anyway so you can save a little more accuracy. You can also combine this with algorithm 1.
Use an iterative approach and stop when you don't have time. The idea here is to randomly select k points from S (allows us to call this set Q '). Then at each step you switch the point p_ from Q ', which has a minimum distance to another in Q' with a random point from S. If the resulting set Q '' is better to continue Q '', otherwise repeat with Q ', so as not to get stuck , you can choose a different point from Q 'than p_ if you could not find an adequate replacement for several iterations.

Paddy3118 · Answer 2 · 2018-02-22T10:47:39+0000

If you can afford to do ~ k * n distance calculations, you could

Find the center of the distribution of points.
Select the point farthest from the center. (and remove it from the set of points not selected).
Find the point that is far from all the currently selected points, and select it.
Repeat 3. until you're done with k dots.

Select a subset of the farthest points in a given set of points

More articles: