Clustering and heatmap in R

I am new to R, and I am trying to do some clustering in a data table, where rows represent individual objects and columns represent functions that were measured for these objects. I worked on some clustering tutorials, and I get some result, however, the heat map that I get after clustering does not correspond at all to the heat map obtained from the same data table with another program. Although the heat map of this program indicates clear differences in marker expression between objects, my heat map does not show large differences, and I cannot recognize any clustering structure (i.e. Color) on the heat map, it just seems randomly confusing a set of colors close to each other (without much contrast). Here is an example of the code that I'm using, maybe someone has an idea of ​​what I might do wrong.

mydata <- read.table("mydata.csv") datamat <- as.matrix(mydata) datalog <- log(datamat) 

I use log values ​​for clustering because I know that another program does this too

 library(gplots) hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete") mycl <- cutree(hr, k=7) mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)] heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA, col=colorpanel(40, "black","yellow","green"), scale="column", RowSideColors=mycol) 

Again, I draw the original colors, but I use log clusters because I know that this is what another program does.

I tried to play with the methods, but I didn’t get anything that at least somehow resembled a cluster heat map. When I take out the scaling, the heatmap becomes very dark (and I'm actually absolutely sure that I am somehow scaling or normalizing the data in the column). I also tried to group using k-means, but again this did not help. My idea was that the color scheme could not be used completely due to two outliers, but although removing them slightly increased the range of colors printed on the heat map, it still did not reveal the correct clusters.

Is there anything else I could play with?

And is it possible to change the color scale using a heat map so that in the last box there are emissions that have a range of "more than a certain value"? I tried to do this with heatmap.2 (the "breaks" argument), but I didn’t quite succeed, and I was not able to set the colors of the string that I use with the heatmap function.

+4
source share
1 answer

If you agree to using heatmap.2 from the gplots package, which will allow you to add breaks to assign colors to the ranges presented in your heatmap.
For example, if you had 3 colors: blue, white and red, values ​​going from low to high, you could do something like this:

 my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7)) result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks) 

In this case, you have 3 sets of values ​​corresponding to 3 colors, the values ​​will differ, depending on what values ​​you have with your data.

One thing you do in your program is to call hclust on your data and then call Heatmap on it, however, if you look at the Heatmap manual page, it states: By default, hclust. Therefore, I do not think you need to do this. You might want to take a look at some of the similar questions I asked to help you point in the right direction:

Question about the heat map 1

Question 2 heatmaps

If you place the image of the obtained heat map and the image of the heat map, which another program does, it will be easier for us to help you in the future.

+1
source

Source: https://habr.com/ru/post/1402877/


All Articles