I am new to R, and I am trying to do some clustering in a data table, where rows represent individual objects and columns represent functions that were measured for these objects. I worked on some clustering tutorials, and I get some result, however, the heat map that I get after clustering does not correspond at all to the heat map obtained from the same data table with another program. Although the heat map of this program indicates clear differences in marker expression between objects, my heat map does not show large differences, and I cannot recognize any clustering structure (i.e. Color) on the heat map, it just seems randomly confusing a set of colors close to each other (without much contrast). Here is an example of the code that I'm using, maybe someone has an idea of what I might do wrong.
mydata <- read.table("mydata.csv") datamat <- as.matrix(mydata) datalog <- log(datamat)
I use log values for clustering because I know that another program does this too
library(gplots) hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete") mycl <- cutree(hr, k=7) mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)] heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA, col=colorpanel(40, "black","yellow","green"), scale="column", RowSideColors=mycol)
Again, I draw the original colors, but I use log clusters because I know that this is what another program does.
I tried to play with the methods, but I didn’t get anything that at least somehow resembled a cluster heat map. When I take out the scaling, the heatmap becomes very dark (and I'm actually absolutely sure that I am somehow scaling or normalizing the data in the column). I also tried to group using k-means, but again this did not help. My idea was that the color scheme could not be used completely due to two outliers, but although removing them slightly increased the range of colors printed on the heat map, it still did not reveal the correct clusters.
Is there anything else I could play with?
And is it possible to change the color scale using a heat map so that in the last box there are emissions that have a range of "more than a certain value"? I tried to do this with heatmap.2 (the "breaks" argument), but I didn’t quite succeed, and I was not able to set the colors of the string that I use with the heatmap function.