I developed a sample code to visualize clustering data with multiple dimensions using all possible data projections in 2-D. This may not be the best idea for visualization (there are methods for this, since SOM itself can be used for this need), especially for higher measurement numbers, but when the number of possible forecasts is (n-1)!
not so large, is a good visualizer.
Cluster Algorithm
Since I needed access to the code so that I could save the cluster facilities and cluster labels for each iteration, I used the fast kmeans algorithm available on FEX Mo Chen , but I had to adapt it so that I could get this access. The adapted code is as follows:
function [label,m] = litekmeans(X, k) % Perform k-means clustering. % X: dxn data matrix % k: number of seeds % Written by Michael Chen ( sth4nth@gmail.com ). n = size(X,2); last = 0; iter = 1; label{iter} = ceil(k*rand(1,n)); % random initialization checkLabel = label{iter}; m = {}; while any(checkLabel ~= last) [u,~,checkLabel] = unique(checkLabel); % remove empty clusters k = length(u); E = sparse(1:n,checkLabel,1,n,k,n); % transform label into indicator matrix curM = X*(E*spdiags(1./sum(E,1)',0,k,k)); % compute m of each cluster m{iter} = curM; last = checkLabel'; [~,checkLabel] = max(bsxfun(@minus,curM'*X,dot(curM,curM,1)'/2),[],1); % assign samples to the nearest centers iter = iter + 1; label{iter} = checkLabel; end % Get last clusters centers m{iter} = curM; % If to remove empty clusters: %for k=1:iter % [~,~,label{k}] = unique(label{k}); %end
Making Gif
I also used the @Amro Matlab video tutorial to create a gif.
Distinguishable colors
I used this excellent FEX Tim Holy to simplify cluster color recognition.
Final code
My final code is as follows. I had some problems because the number of clusters would change by each iteration, which would lead to an update of the scatter plot to remove all cluster centers without any errors. Since I did not notice this, I tried to bypass the scattering function using any obscure method that I could find on the Internet (by the way, I found a really nice alternative scattering option here ), but, fortunately, I got what happened in this year. Here is the code I made for it, you can freely use it, adapt it, but please keep my link if you use it.
function varargout=kmeans_test(data,nClusters,plotOpts,dimLabels,... bigXDim,bigYDim,gifName) % % [label,m,figH,handles]=kmeans_test(data,nClusters,plotOpts,... % dimLabels,bigXDim,bigYDim,gifName) % Demonstrate kmeans algorithm iterative progress. Inputs are: % % -> data (rand(5,100)): the data to use. % % -> nClusters (7): number of clusters to use. % % -> plotOpts: struct holding the following fields: % % o leftBase: the percentage distance from the left % % o rightBase: the percentage distance from the right % % o bottomBase: the percentage distance from the bottom % % o topBase: the percentage distance from the top % % o FontSize: FontSize for axes labels. % % o widthUsableArea: Total width occupied by axes % % o heigthUsableArea: Total heigth occupied by axes % % -> bigXDim (1): the big subplot x dimension % % -> bigYDim (2): the big subplot y dimension % % -> dimLabels: If you want to specify dimensions labels % % -> gifName: gif file name to save % % Outputs are: % % -> label: Sample cluster center number for each iteration % % -> m: cluster center mean for each iteration % % -> figH: figure handle % % -> handles: axes handles % % % - Creation Date: Fri, 13 Sep 2013 % - Last Modified: Mon, 16 Sep 2013 % - Author(s): % - WSFreund <wsfreund_at_gmail_dot_com> % % TODO List (?): % % - Use input parser % - Adapt it to be able to cluster any algorithm function. % - Use arrows indicating cluster centers movement before moving them. % - Drag and drop small axes to big axes. % % Pre-start if nargin < 7 gifName = 'kmeansClusterization.gif'; if nargin < 6 bigYDim = 2; if nargin < 5 bigXDim = 1; if nargin < 4 nDim = size(data,1); maxDigits = numel(num2str(nDim)); dimLabels = mat2cell(sprintf(['Dim %0' num2str(maxDigits) 'd'],... 1:nDim),1,zeros(1,nDim)+4+maxDigits); if nargin < 3 plotOpts = struct('leftBase',.05,'rightBase',.02,... 'bottomBase',.05,'topBase',.02,'FontSize',10,... 'widthUsableArea',.87,'heigthUsableArea',.87); if nargin < 2 nClusters = 7; if nargin < 1 center1 = [1; 0; 0; 0; 0]; center2 = [0; 1; 0; 0; 0]; center3 = [0; 0; 1; 0; 0]; center4 = [0; 0; 0; 1; 0]; center5 = [0; 0; 0; 0; 1]; center6 = [0; 0; 0; 0; 1.5]; center7 = [0; 0; 0; 1.5; 1]; data = [... bsxfun(@plus,center1,.5*rand(5,20)) ... bsxfun(@plus,center2,.5*rand(5,20)) ... bsxfun(@plus,center3,.5*rand(5,20)) ... bsxfun(@plus,center4,.5*rand(5,20)) ... bsxfun(@plus,center5,.5*rand(5,20)) ... bsxfun(@plus,center6,.2*rand(5,20)) ... bsxfun(@plus,center7,.2*rand(5,20)) ... ]; end end end end end end end % NOTE of advice: It seems that Matlab does not test while on % refreshdata if the dimension of the inputs are equivalent for the % XData, YData and CData while using scatter. Because of this I wasted % a lot of time trying to debug what was the problem, trying many % workaround since my cluster centers would disappear for no reason. % Draw axes: nDim = size(data,1); figH = figure; set(figH,'Units', 'normalized', 'Position',... [0, 0, 1, 1],'Color','w','Name',... 'k-means example','NumberTitle','Off',... 'MenuBar','none','Toolbar','figure',... 'Renderer','zbuffer'); % Create dintinguishable colors matrix: colorMatrix = distinguishable_colors(nClusters); % Create axes, deploy them on handles matrix more or less how they % will be positioned: [handles,horSpace,vertSpace] = ... createAxesGrid(5,5,plotOpts,dimLabels); % Add main axes bigSubSize = ceil(nDim/2); bigSubVec(bigSubSize^2) = 0; for k = 0:nDim-bigSubSize bigSubVec(k*bigSubSize+1:(k+1)*bigSubSize) = ... ... %(nDim-bigSubSize+k)*nDim+1:(nDim-bigSubSize+k)*nDim+(nDim-bigSubSize+1); bigSubSize+nDim*k:nDim*(k+1); end handles(bigSubSize,bigSubSize) = subplot(nDim,nDim,bigSubVec,... 'FontSize',plotOpts.FontSize,'box','on'); bigSubplotH = handles(bigSubSize,bigSubSize); horSpace(bigSubSize,bigSubSize) = bigSubSize; vertSpace(bigSubSize,bigSubSize) = bigSubSize; set(bigSubplotH,'NextPlot','add',... 'FontSize',plotOpts.FontSize,'box','on',... 'XAxisLocation','top','YAxisLocation','right'); % Squeeze axes through space to optimize space usage and improve % visualization capability: [leftPos,botPos,subplotWidth,subplotHeight]=setCustomPlotArea(... handles,plotOpts,horSpace,vertSpace); pColorAxes = axes('Position',[leftPos(end) botPos(end) ... subplotWidth subplotHeight],'Parent',figH); pcolor([1:nClusters+1;1:nClusters+1]) % image(reshape(colorMatrix,[1 size(colorMatrix)])); % Used image to % check if the upcoming buggy behaviour would be fixed. I was not % lucky, though... colormap(pColorAxes,colorMatrix); % Change XTick positions to its center: set(pColorAxes,'XTick',.5:1:nClusters+.5); set(pColorAxes,'YTick',[]); % Change its label to cluster number: set(pColorAxes,'XTickLabel',[nClusters 1:nClusters-1]); % FIXME At % least on my matlab I have to use this buggy way to set XTickLabel. % Am I doing something wrong? Since I dunno why this is caused, I just % change the code so that it looks the way it should look, but this is % quite strange... xlabel(pColorAxes,'Clusters Colors','FontSize',plotOpts.FontSize); % Now iterate throw data and get cluster information: [label,m]=litekmeans(data,nClusters); nIters = numel(m)-1; scatterColors = colorMatrix(label{1},:); annH = annotation('textbox',[leftPos(1),botPos(1) subplotWidth ... subplotHeight],'String',sprintf('Start Conditions'),'EdgeColor',... 'none','FontSize',18); % Creates dimData_%d variables for first iteration: for curDim=1:nDim curDimVarName = genvarname(sprintf('dimData_%d',curDim)); eval([curDimVarName,'= m{1}(curDim,:);']); end % clusterColors will hold the colors for the total number of clusters % on each iteration: clusterColors = colorMatrix; % Draw cluster information for first iteration: for curColumn=1:nDim for curLine=curColumn+1:nDim % Big subplot data: if curColumn == bigXDim && curLine == bigYDim curAxes = handles(bigSubSize,bigSubSize); curScatter = scatter(curAxes,data(curColumn,:),... data(curLine,:),16,'filled'); set(curScatter,'CDataSource','scatterColors'); % Draw cluster centers curColumnVarName = genvarname(sprintf('dimData_%d',curColumn)); curLineVarName = genvarname(sprintf('dimData_%d',curLine)); eval(['curScatter=scatter(curAxes,' curColumnVarName ',' ... curLineVarName ',100,colorMatrix,''^'',''filled'');']); set(curScatter,'XDataSource',curColumnVarName,'YDataSource',... curLineVarName,'CDataSource','clusterColors') end % Small subplots data: curAxes = handles(curLine,curColumn); % Draw data: curScatter = scatter(curAxes,data(curColumn,:),... data(curLine,:),16,'filled'); set(curScatter,'CDataSource','scatterColors'); % Draw cluster centers curColumnVarName = genvarname(sprintf('dimData_%d',curColumn)); curLineVarName = genvarname(sprintf('dimData_%d',curLine)); eval(['curScatter=scatter(curAxes,' curColumnVarName ',' ... curLineVarName ',100,colorMatrix,''^'',''filled'');']); set(curScatter,'XDataSource',curColumnVarName,'YDataSource',... curLineVarName,'CDataSource','clusterColors'); if curLine==nDim xlabel(curAxes,dimLabels{curColumn}); set(curAxes,'XTick',xlim(curAxes)); end if curColumn==1 ylabel(curAxes,dimLabels{curLine}); set(curAxes,'YTick',ylim(curAxes)); end end end refreshdata(figH,'caller'); % Preallocate gif frame. From Amro tutorial here: % /questions/200405/approaches-to-create-a-video-in-matlab/1092876
Example
Here is an example using 5 dimensions using code:
center1 = [1; 0; 0; 0; 0]; center2 = [0; 1; 0; 0; 0]; center3 = [0; 0; 1; 0; 0]; center4 = [0; 0; 0; 1; 0]; center5 = [0; 0; 0; 0; 1]; center6 = [0; 0; 0; 0; 1.5]; center7 = [0; 0; 0; 1.5; 1]; data = [... bsxfun(@plus,center1,.5*rand(5,20)) ... bsxfun(@plus,center2,.5*rand(5,20)) ... bsxfun(@plus,center3,.5*rand(5,20)) ... bsxfun(@plus,center4,.5*rand(5,20)) ... bsxfun(@plus,center5,.5*rand(5,20)) ... bsxfun(@plus,center6,.2*rand(5,20)) ... bsxfun(@plus,center7,.2*rand(5,20)) ... ]; [label,m,figH,handles]=kmeans_test(data,20);