Sort a list of floating point numbers in groups

I have an array of floating point numbers that is unordered. I know that values ​​always fall to a few points that are unknown. To illustrate this list

[10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99] 

has values ​​grouped around 5 and 10, so I would like to answer [5,10].

I would like to find these clusters for lists with 1000+ values, where the nunber clusters are probably around 10 (for some given tolerance). How to do it efficiently?

+6
source share
2 answers

Check python-cluster

Using this library, you can do something like this:

 from cluster import * data = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99] cl = HierarchicalClustering(data, lambda x,y: abs(xy)) print [mean(cluster) for cluster in cl.getlevel(1.0)] 

And you will get:

 [5.0062, 10.003333333333332] 

(This is a very stupid example, because I really do not know what you want to do, and because this is the first time I used this library)

+13
source

You can try the following method:

Sort the array first and use diff () to calculate the difference between two continuous values. the difference exceeding the threshold can be considered as a divided position:

 import numpy as np x = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99] x = np.sort(x) th = 0.5 print [group.mean() for group in np.split(x, np.where(np.diff(x) > th)[0]+1)] 

result:

 [5.0061999999999998, 10.003333333333332] 
+2
source

Source: https://habr.com/ru/post/902105/


All Articles