Agglomerative clustering in Matlab

Question

Agglomerative clustering in Matlab

I have a simple two-dimensional data set that I want to copy in an agglomerative way (without knowing the optimal number of clusters used). The only way I was able to group my data was to set the function to "maxclust".

For simplicity, let's say this is my dataset:

X=[ 1,1; 1,2; 2,2; 2,1; 5,4; 5,5; 6,5; 6,4 ];

Naturally, I would like this data to form 2 clusters. I understand that if I knew this, I could just say:

 T = clusterdata(X,'maxclust',2);

and to find which points fall into each cluster, I could say:

 cluster_1 = X(T==1, :);

and

 cluster_2 = X(T==2, :);

but not knowing that 2 clusters will be optimal for this data set, how do I group this data?

thanks

+6

matlab classification cluster-analysis dendrogram

Kevin_TA Nov 04 '11 at 22:13

source share

3 answers

To select the optimal number of clusters, one common approach is to make the plot similar to Scree Plot. Then you look for the “elbow” in the plot, and this is the number of clusters that you select. For the criterion here we will use the intracluster sums of squares:

 function wss = plotScree(X, n) wss = zeros(1, n); wss(1) = (size(X, 1)-1) * sum(var(X, [], 1)); for i=2:n T = clusterdata(X,'maxclust',i); wss(i) = sum((grpstats(T, T, 'numel')-1) .* sum(grpstats(X, T, 'var'), 2)); end hold on plot(wss) plot(wss, '.') xlabel('Number of clusters') ylabel('Within-cluster sum-of-squares')

 >> plotScree(X, 5) ans = 54.0000 4.0000 3.3333 2.5000 2.0000

+5

John colby Nov 04 '11 at 23:17

source share

You can use the NbClust package in R, which uses 30 indexes to determine the optimal number of clusters in a dataset.

-1

Richie Nov 14 '14 at 14:33

source share

Amro · Accepted Answer · 2011-11-05T01:35:04+0000

The whole point of this method is that it represents the clusters found in the hierarchy, and you decide how many details you want to get.

agglomerative dendogram

Think of it as a horizontal line crossing a dendrogram that moves from 0 (each point is its own cluster) up to its maximum value (all points in one cluster). You could:

stop when you reach the specified number of clusters ( example )
manually place it with a specific height value ( example )
choose a place where the clusters are too far from each other according to the distance criterion (i.e. a big jump to the next level) ( example )

This can be done either using the 'maxclust' or 'cutoff' of the CLUSTER / CLUSTERDATA functions

Agglomerative clustering in Matlab

More articles: