I have a dataset (of 5000 points with 4 dimensions) that I have grouped using kmeans in R.
I want to order points in each cluster according to their distance to the center of this cluster.
Very simple, the data looks like this (I use a subset to test different approaches):
id Ans Acc Que Kudos 1 100 100 100 100 2 85 83 80 75 3 69 65 30 29 4 41 45 30 22 5 10 12 18 16 6 10 13 10 9 7 10 16 16 19 8 65 68 100 100 9 36 30 35 29 10 36 30 26 22
First, I used the following method to cluster a dataset into 2 clusters:
(result <- kmeans(data, 2))
This returns a kmeans object that has the following methods : cluster, centers, etc.
But I can't figure out how to compare each point and create an ordered list.
Secondly, I tried the serialization approach proposed by another SO user here
I use the following commands:
clus <- kmeans(scale(x, scale = FALSE), centers = 3, iter.max = 50, nstart = 10) mns <- sapply(split(x, clus$cluster), function(x) mean(unlist(x))) result <- dat[order(order(mns)[clus$cluster]), ]
It seems that an ordered list is being created, but if I bind it to the marked clusters (using the following cbind command):
result <- cbind(x[order(order(mns)[clus$cluster]), ],clus$cluster)
I get the following result, which does not seem to be ordered correctly:
id Ans Acc Que Kudos clus 1 3 69 65 30 29 1 2 4 41 45 30 22 1 3 5 10 12 18 16 2 4 6 10 13 10 9 2 5 7 10 16 16 19 2 6 9 36 30 35 29 2 7 10 36 30 26 22 2 8 1 100 100 100 100 1 9 2 85 83 80 75 2 10 8 65 68 100 100 2
I do not want to write commands perforce, but I understand how this approach works. If anyone could help or spread some light on this, that would be really great.
EDIT :::::
Since clusters can be easily built, I would suggest that there is an easier way to get and rank the distances between points and the center.
The centers for these clusters (when using k = 2) are as follows. But I do not know how to get and compare this with every single point.
Ans Accep Que Kudos 1 83.33333 83.66667 93.33333 91.66667 2 30.28571 30.14286 23.57143 20.85714
NB ::::
I do not need the upper kilometers of use, but I want to specify the number of clusters and get an ordered list of points from these clusters.