Order clustered points using Kmeans and R

I have a dataset (of 5000 points with 4 dimensions) that I have grouped using kmeans in R.

I want to order points in each cluster according to their distance to the center of this cluster.

Very simple, the data looks like this (I use a subset to test different approaches):

id Ans Acc Que Kudos 1 100 100 100 100 2 85 83 80 75 3 69 65 30 29 4 41 45 30 22 5 10 12 18 16 6 10 13 10 9 7 10 16 16 19 8 65 68 100 100 9 36 30 35 29 10 36 30 26 22 

First, I used the following method to cluster a dataset into 2 clusters:

 (result <- kmeans(data, 2)) 

This returns a kmeans object that has the following methods : cluster, centers, etc.

But I can't figure out how to compare each point and create an ordered list.

Secondly, I tried the serialization approach proposed by another SO user here

I use the following commands:

 clus <- kmeans(scale(x, scale = FALSE), centers = 3, iter.max = 50, nstart = 10) mns <- sapply(split(x, clus$cluster), function(x) mean(unlist(x))) result <- dat[order(order(mns)[clus$cluster]), ] 

It seems that an ordered list is being created, but if I bind it to the marked clusters (using the following cbind command):

 result <- cbind(x[order(order(mns)[clus$cluster]), ],clus$cluster) 

I get the following result, which does not seem to be ordered correctly:

 id Ans Acc Que Kudos clus 1 3 69 65 30 29 1 2 4 41 45 30 22 1 3 5 10 12 18 16 2 4 6 10 13 10 9 2 5 7 10 16 16 19 2 6 9 36 30 35 29 2 7 10 36 30 26 22 2 8 1 100 100 100 100 1 9 2 85 83 80 75 2 10 8 65 68 100 100 2 

I do not want to write commands perforce, but I understand how this approach works. If anyone could help or spread some light on this, that would be really great.

EDIT :::::

Since clusters can be easily built, I would suggest that there is an easier way to get and rank the distances between points and the center.

The centers for these clusters (when using k = 2) are as follows. But I do not know how to get and compare this with every single point.

  Ans Accep Que Kudos 1 83.33333 83.66667 93.33333 91.66667 2 30.28571 30.14286 23.57143 20.85714 

NB ::::

I do not need the upper kilometers of use, but I want to specify the number of clusters and get an ordered list of points from these clusters.

+6
source share
1 answer

Here is an example that does what you ask using the first example from ?kmeans . This is probably not very effective, but you can rely on it.

 #Taken straight from ?kmeans x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) colnames(x) <- c("x", "y") cl <- kmeans(x, 2) x <- cbind(x,cl = cl$cluster) #Function to apply to each cluster to # do the ordering orderCluster <- function(i,data,centers){ #Extract cluster and center dt <- data[data[,3] == i,] ct <- centers[i,] #Calculate distances dt <- cbind(dt,dist = apply((dt[,1:2] - ct)^2,1,sum)) #Sort dt[order(dt[,4]),] } do.call(rbind,lapply(sort(unique(cl$cluster)),orderCluster,data = x,centers = cl$centers)) 
+6
source

Source: https://habr.com/ru/post/912758/


All Articles