Get cluster size in sklearn in python

Question

Get cluster size in sklearn in python

I use sklearn DBSCAN to cluster my data as follows.

#Apply DBSCAN (sims == my data as list of lists)
db1 = DBSCAN(min_samples=1, metric='precomputed').fit(sims)

db1_labels = db1.labels_
db1n_clusters_ = len(set(db1_labels)) - (1 if -1 in db1_labels else 0)
#Returns the number of clusters (E.g., 10 clusters)
print('Estimated number of clusters: %d' % db1n_clusters_)

Now I want to get the top 3 clusters sorted by size (the number of data points in each cluster). Please let me know how to get cluster size in sklearn?

+4

python scikit-learn machine-learning cluster-analysis dbscan

user8510273 Sep 11 '17 at 12:17

source share

2 answers

Well, you can use the Bincount function in Numpy to get the label frequencies. For example, we will use an example for DBSCAN using scikit-learn:

#Store the labels
labels = db.labels_

#Then get the frequency count of the non-negative labels
counts = np.bincount(labels[labels>=0])

print counts
#Output : [243 244 245]

, 3 , argsort numpy. , 3 , 2 :

top_labels = np.argsort(-counts)[:2]

print top_labels
#Output : [2 1]

#To get their respective frequencies
print counts[top_labels]

+2

Mohammed Kashif 11 . '17 16:56

σηγ · Accepted Answer · 2017-09-11T17:17:48+0000

Another option is to use numpy.unique:

db1_labels = db1.labels_
labels, counts = np.unique(db1_labels[db1_labels>=0], return_counts=True)
print labels[np.argsort(-counts)[:3]]

Get cluster size in sklearn in python

More articles: