I am trying to build an elbow k using the code below:
load CSDmat %mydata for k = 2:20 opts = statset('MaxIter', 500, 'Display', 'off'); [IDX1,C1,sumd1,D1] = kmeans(CSDmat,k,'Replicates',5,'options',opts,'distance','correlation');% kmeans matlab [yy,ii] = min(D1'); %% assign points to nearest center distort = 0; distort_across = 0; clear clusts; for nn=1:k I = find(ii==nn); %% indices of points in cluster nn J = find(ii~=nn); %% indices of points not in cluster nn clusts{nn} = I; %% save into clusts cell array if (length(I)>0) mu(nn,:) = mean(CSDmat(I,:)); %% update mean %% Compute within class distortion muB = repmat(mu(nn,:),length(I),1); distort = distort+sum(sum((CSDmat(I,:)-muB).^2)); %% Compute across class distortion muB = repmat(mu(nn,:),length(J),1); distort_across = distort_across + sum(sum((CSDmat(J,:)-muB).^2)); end end %% Set distortion as the ratio between the within %% class scatter and the across class scatter distort = distort/(distort_across+eps); bestD(k)=distort; bestC=clusts; end figure; plot(bestD);
bestD
values ββ(within cluster dispersion / between cluster dispersion)
[ 0.401970132754914 0.193697163350293 0.119427184084282 0.0872681777446508 0.0687948264457301 0.0566215549396577 0.0481117619129058 0.0420491551659459 0.0361696583755145 0.0320384092689509 0.0288948343304147 0.0262373245283877 0.0239462330460614 0.0218350896369853 0.0201506779033703 0.0186757121130685 0.0176258625858971 0.0163239661159014 0.0154933431470081 ]
Adapted from Lihi Zelnik-Manor, March 2005, Caltech.
The ratio of the graph within the cluster dispersion to the dispersion of the cluster is a smooth curve with a knee that is smooth, like the curve, bestD
data above. How to find a knee for such graphs?