1D Cluster Clustering

Question

1D Cluster Clustering

Possible duplicate:
Optimization of cluster one-dimensional data?

So let's say I have an array like this:

[1,1,2,3,10,11,13,67,71]

Is there a convenient way to split an array into something like this?

 [[1,1,2,3],[10,11,13],[67,71]]

I looked at similar questions, but most people suggested using k-tools for clusters like scipy , which is quite confusing for a newbie like me. Also, I think the k-tool is more suitable for two or more dimensional clusters? Is there a way to split an array of N numbers into many sections / clusters depending on the numbers?

Some people also offer hard range splitting, but this does not always give the results as expected.

+50

arrays dimension cluster-analysis data-mining partition-problem

EH Jul 16 '12 at 22:25

source share

2 answers

You can search for sampling algorithms. The 1D sampling problem is very similar to what you ask. They decide the cut-off points, depending on the frequency, binning strategy, etc.

weka uses the following algorithms in its sampling process.

weka.filters.supervised.attribute.Discretize
uses the Fayyad method and Irani MDL or the MDON Kononeko criterion
weka.filters.unsupervised.attribute.Discretize
uses simple binning

+5

Atilla Ozgur Jul 18 '12 at 10:14

source share

Anony-Mousse · Accepted Answer · 2012-07-17 05:38

Do not use multidimensional clustering algorithms for a one-dimensional task. The only dimension is much more special than you naively think, because you can sort it, which greatly facilitates the work.

In fact, it is usually called not clustering, but, for example, segmentation or optimization of natural discontinuities.

You can look at Jenks Natural Breaks Optimization and similar statistical methods. Estimating core density is also a good search method with a strong statistical background. Local density minima are good places to divide data into clusters, with statistical considerations for this. KDE is perhaps the most reliable method for clustering one-dimensional data.

With KDE, it again becomes apparent that one-dimensional data behaves much better. In 1D, you have local lows; but in 2D you can have saddle points and such “maybe” split points. See Wikipedia for an illustration of a saddle point , as such a point may or may not be suitable for cluster separation.

1D Cluster Clustering

More articles: