1D Cluster Clustering

Possible duplicate:
Optimization of cluster one-dimensional data?

So let's say I have an array like this:

[1,1,2,3,10,11,13,67,71] 

Is there a convenient way to split an array into something like this?

 [[1,1,2,3],[10,11,13],[67,71]] 

I looked at similar questions, but most people suggested using k-tools for clusters like scipy , which is quite confusing for a newbie like me. Also, I think the k-tool is more suitable for two or more dimensional clusters? Is there a way to split an array of N numbers into many sections / clusters depending on the numbers?

Some people also offer hard range splitting, but this does not always give the results as expected.

+50
arrays dimension cluster-analysis data-mining partition-problem
Jul 16 '12 at 22:25
source share
2 answers

Do not use multidimensional clustering algorithms for a one-dimensional task. The only dimension is much more special than you naively think, because you can sort it, which greatly facilitates the work.

In fact, it is usually called not clustering, but, for example, segmentation or optimization of natural discontinuities.

You can look at Jenks Natural Breaks Optimization and similar statistical methods. Estimating core density is also a good search method with a strong statistical background. Local density minima are good places to divide data into clusters, with statistical considerations for this. KDE is perhaps the most reliable method for clustering one-dimensional data.

With KDE, it again becomes apparent that one-dimensional data behaves much better. In 1D, you have local lows; but in 2D you can have saddle points and such “maybe” split points. See Wikipedia for an illustration of a saddle point , as such a point may or may not be suitable for cluster separation.

+78
Jul 17 2018-12-12T00:
source share

You can search for sampling algorithms. The 1D sampling problem is very similar to what you ask. They decide the cut-off points, depending on the frequency, binning strategy, etc.

weka uses the following algorithms in its sampling process.

weka.filters.supervised.attribute.Discretize

uses the Fayyad method and Irani MDL or the MDON Kononeko criterion

weka.filters.unsupervised.attribute.Discretize

uses simple binning

+5
Jul 18 '12 at 10:14
source share



All Articles