Search for the distribution of each cluster from Kmeans

I am trying to determine how well the input vector is suitable for a given cluster center. I can find the best match quite easily (the center with the minimum Euclidean distance to the input vector is the best), but now I need to work how good the match is.

To do this, I need to find the scatter (standard deviation?) Of the vectors that create the centroid, and then see if the distance from my input vector to the center is less than the scatter. If this is more than distribution, than I can say that I do not have clusters to match it (given that the best does not match the input vector).

I am not sure how to find a distribution for each cluster. I have all the center vectors, and all the training vectors are labeled with their closest cluster, I just can’t fully understand what I need to do to get it distributed.

I hope this is clear? If not, I will try to change this! TIA Ian

+3
source share
2 answers

Use the distance function and calculate the distance from your center point to each marked point, then determine the average of these distances. This should give you the standard deviation.

+4
source

Source: https://habr.com/ru/post/1734113/


All Articles