Is there a way to infer distortion for each row when creating clusters using kmeans?

Here is some code:

df_tr_std = stats.zscore(df_tr[clmns])

km = KMeans(n_clusters=3, init='k-means++',n_init=10,max_iter=300,tol=1e-04,random_state=0)
y_km = km.fit_predict(df_tr_std)

I tried to turn to inertial, but this is a complete distortion. This following code works to calculate individual distances:

distance = euclidean_distances(km.cluster_centers_, df_tr_std)

but it divides the distances into 3 arrays (or however many sets of clusters I create). Is there a way to do this without label / cluster separation?

I would like to extend my original dataset to a distance column so that I can identify the largest distances. I also need the closest distances, but I was able to find this using this code:

closest, _ = pairwise_distances_argmin_min(km.cluster_centers_, df_tr_std)
+4
source share
1 answer

, , , , . . , K-Means . , , . - .

:

cluster_centers = km.cluster_centers_
centroids = cluster_centers[y_km]
distortion = ((df_tr_std - centroids)**2.0).sum(axis=1)

K -. , , . , , , .

:

distortion = ((df_tr_std - km.cluster_centers_[y_km])**2.0).sum(axis=1)

. , distortion N, NumPy N, . , , .

, km.inertia_, , , , distortion.sum() km.inertia_.

:

In [27]: import numpy as np

In [28]: from sklearn.cluster import KMeans

In [29]: df_tr_std = np.random.rand(1000,3)

In [30]: km = KMeans(n_clusters=3, init='k-means++',n_init=10,max_iter=300,tol=
    ...: 1e-04,random_state=0)

In [31]: y_km = km.fit_predict(df_tr_std)

In [32]: distortion = ((df_tr_std - km.cluster_centers_[y_km])**2.0).sum(axis=1)

In [33]: km.inertia_
Out[33]: 147.01626670004867

In [34]: distortion.sum()
Out[34]: 147.01626670004865

, , , , .

, , , , , .

+2

Source: https://habr.com/ru/post/1694323/


All Articles