Average nearest coordinates in Python

Question

Average nearest coordinates in Python

This is a continuation of my previous question. Now I have a sorted list of coordinates in Euclidean space. I want to average the closest coordinates so that clustering works, i.e. The entire cluster was averaged and returned one unit in Euclidean space. So, for example, the list below

a = [[ 42, 206],[ 45,  40],[ 45, 205],[ 46,  41],[ 46, 205],[ 47,  40],[ 47, 202],[ 48,  40],[ 48, 202],[ 49,  38]]

will return avg_coordinates = [[47.0, 39.8], [45.6, 204.0]]. This is done by averaging the first 5 nearest points (or cluster), and then the last 5 nearest points. Right now I am using the gradient approach, which I go through all the coordinates and where the gradient is higher than some installation threshold, then I consider it as another cluster of points (because the list is already sorted). But the problem arises when I have a senior denominator and then a numerator in the gradient formula gradient = (y2-y1)/(x2-x1)that return a lower value than the threshold. Therefore, logically, I am doing it wrong. Any good suggestions for this? Please note: I do not want to apply clustering.

+4

python arrays numpy

muazfaiz Jan 28 '17 at 11:59

source share

2 answers

, , k-means scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

+2

Luca Citi 28 . '17 12:25

Divakar · Accepted Answer · 2017-01-28T12:22:20+0000

Here's the approach -

thresh = 100 # Threshold for splitting, heuristically chosen for given sample

# Lex-sort of coordinates
b = a[np.lexsort(a.T)]

# Interval indices that partition the clusters
diff_idx = np.flatnonzero(np.linalg.norm(b[1:] - b[:-1],axis=1) > thresh)+1
idx = np.hstack((0, diff_idx, b.shape[0]))
sums = np.add.reduceat(b, idx[:-1])
counts = idx[1:] - idx[:-1]
out = sums/counts.astype(float)[:,None]

Example input, output -

In [141]: a
Out[141]: 
array([[ 42, 206],
       [ 45,  40],
       [ 45, 205],
       [ 46,  41],
       [ 46, 205],
       [ 47,  40],
       [ 47, 202],
       [ 48,  40],
       [ 48, 202],
       [ 49,  38]])

In [142]: out
Out[142]: 
array([[  47. ,   39.8],
       [  45.6,  204. ]])

Average nearest coordinates in Python

More articles: