Effectively remove arrays close to each other, given the threshold value in python

Question

Effectively remove arrays close to each other, given the threshold value in python

I use python for this task and is very objective here, I want to find a "pythonic" way to remove "duplicates" from an array of arrays that are close to each other from the threshold. For example, give this array:

[[ 5.024,  1.559,  0.281], [ 6.198,  4.827,  1.653], [ 6.199,  4.828,  1.653]]

note that [ 6.198, 4.827, 1.653]they are [ 6.199, 4.828, 1.653]really close to each other, their Euclidean distance 0.0014, so they are almost "duplicates", I want my final result to be simple:

[[ 5.024,  1.559,  0.281], [ 6.198,  4.827,  1.653]]

The algorithm I have now is:

to_delete = [];
for i in unique_cluster_centers:
    for ii in unique_cluster_centers:
        if i == ii:
            pass;
        elif np.linalg.norm(np.array(i) - np.array(ii)) <= self.tolerance:
            to_delete.append(ii);
            break;

for i in to_delete:
    try:
        uniques.remove(i);
    except:
        pass;

but it is very slow, I would like to know a faster and "pufonic" way to solve this problem. My tolerance is 0.0001.

+4

python numpy duplicates distance

Pj- Mar 26 '17 at 22:50

2

Willem Van Onsem · Answer 1 · 2017-03-26T23:01:15+0000

A :

def filter_quadratic(data,condition):
    result = []
    for element in data:
        if all(condition(element,other) for other in result):
            result.append(element)
    return result

filter, . , *, .

:

def the_condition(xs,ys):
    # working with squares, 2.5e-05 is 0.005*0.005 
    return sum((x-y)*(x-y) for x,y in zip(xs,ys)) > 2.5e-05

:

>>> filter_quadratic([[ 5.024,  1.559,  0.281], [ 6.198,  4.827,  1.653], [ 6.199,  4.828,  1.653]],the_condition)
[[5.024, 1.559, 0.281], [6.198, 4.827, 1.653]]

O (n ²), n - , . k-d, .

Sergey · Answer 2 · 2017-03-26T23:25:42+0000

( O (n ^ 2)), .

, , " " . , .

result = {}
for row in unique_cluster_centers:
    # round each value to 2 decimal places: 
    # [5.024,  1.559,  0.281] => (5.02,  1.56,  0.28)
    # you can be inventive and, say, multiply each value by 3 before rounding
    # if you want precision other than a whole decimal point.
    key = tuple([round(v, 2) for v in row])  # tuples can be keys of a dict
    if key not in result:
        result[key] = row
return result.values()  # I suppose the order of the items is not important, you can use OrderedDict otherwise

Effectively remove arrays close to each other, given the threshold value in python

More articles: