Sklearn agglomeration clustering input

I have a similarity matrix between four users. I want to do agglomerative clustering. The code is as follows:

lena = np.matrix('1 1 0 0;1 1 0 0;0 0 1 0.2;0 0 0.2 1') X = np.reshape(lena, (-1, 1)) print("Compute structured hierarchical clustering...") st = time.time() n_clusters = 3 # number of regionsle ward = AgglomerativeClustering(n_clusters=n_clusters, linkage='complete').fit(X) print ward label = np.reshape(ward.labels_, lena.shape) print("Elapsed time: ", time.time() - st) print("Number of pixels: ", label.size) print("Number of clusters: ", np.unique(label).size) print label 

The result of printing the label is as follows:

 [[1 1 0 0] [1 1 0 0] [0 0 1 2] [0 0 2 1]] 

Does this mean that he gives lists of possible cluster results, can we choose one of them? as a choice: [0,0,2,1]. If this is not the case, could you tell me how to make an agglomeration algorithm based on similarity? If everything is correct, the similarity matrix is ​​huge, how can I choose the optimal clustering result from a huge list? Thanks

+5
source share
1 answer

I think the problem here is that you are approaching your model with incorrect data

 # This will return a 4x4 matrix (similarity matrix) lena = np.matrix('1 1 0 0;1 1 0 0;0 0 1 0.2;0 0 0.2 1') # However this will return 16x1 matrix X = np.reshape(lena, (-1, 1)) 

True result:

  ward.labels_ >> array([1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 2, 0, 0, 2, 1]) 

What is the label of each element in the vector X and does not make sensation

If I understand your problem well, you need to classify your users by the distance between them (similarity). Well, in this case, I suggest using spectral clustering as follows:

 import numpy as np from sklearn.cluster import SpectralClustering lena = np.matrix('1 1 0 0;1 1 0 0;0 0 1 0.2;0 0 0.2 1') n_clusters = 3 SpectralClustering(n_clusters).fit_predict(lena) >> array([1, 1, 0, 2], dtype=int32) 
+1
source

Source: https://habr.com/ru/post/1233272/


All Articles