Kmeans inter and intragroup ordering

Question

Kmeans inter and intragroup ordering

I am wondering what other people are doing with ordering K-type clusters. I make Heatmaps (mainly ChIP-Seq ) and get beautiful shapes with a custom heatmap function (based on R built into the heatmap function). However, I would like to make two improvements. The first is to streamline my clusters based on decreasing average. For example, the following code:

fit = kmeans(data, 8, iter.max=50, nstart=10) d = data.frame(data, symbol) d = data.frame(d, fit$cluster) d = d[order(d$fit.cluster),]

gives me a data.frame ordered in a column of clusters. What is the best way to arrange rows so that 8 clusters are in the order of their respective means?

Secondly, do you recommend sorting the WITHIN rows in each cluster from the highest average to the lowest? This will leave a more organized view of the data, but it can trick the careless observer to deduce something that he may not have to. If you recommend this, how would you do it most effectively?

+5

r

Ron gejman Jan 24 '11 at 20:13

source share

1 answer

Gavin simpson · Accepted Answer · 2011-01-25T16:43:06+0000

An inaccurate answer to what you are asking for, but perhaps you can consider serialization instead of k-mean clustering. This is a bit like ordination, not clustering, but one end result is a heat map of the serialized data, which is similar to what you seem to be doing with k-signs, followed by a specially ordered heat map.

There is an R package for seriation called seriation , and it has a vignette that you can get directly from CRAN

I will answer the specifics of Q as soon as I have prepared an example to try.

Good is the correct answer, resulting from your comment above. First, some dummy data - 3 clusters of 10 samples for each of the three variables.

 set.seed(1) dat <- data.frame(A = c(rnorm(10, 2), rnorm(10, -2), rnorm(10, -2)), B = c(rnorm(10, 0), rnorm(10, 5), rnorm(10, -2)), C = c(rnorm(10, 0), rnorm(10, 0), rnorm(10, -10))) ## randomise the rows dat <- dat[sample(nrow(dat)),] clus <- kmeans(scale(dat, scale = FALSE), centers = 3, iter.max = 50, nstart = 10) ## means of n points in each cluster mns <- sapply(split(dat, clus$cluster), function(x) mean(unlist(x))) ## order the data by cluster with clusters ordered by `mns`, low to high dat2 <- do.call("rbind", split(dat, clus$cluster)[order(mns)]) ## heatmaps ## original first, then reordered: layout(matrix(1:2, ncol = 2)) image(1:3, 1:30, t(data.matrix(dat)), ylab = "Observations", xlab = "Variables", xaxt = "n", main = "Original") axis(1, at = 1:3) image(1:3, 1:30, t(data.matrix(dat2)), ylab = "Observations", xlab = "Variables", xaxt = "n", main = "Reordered") axis(1, at = 1:3) layout(1)

Yielding:

Kmeans inter and intragroup ordering

More articles: