Walk through the hierarchical tree

I want to be able to "walk" (iterate) through a hierarchical cluster (see the figure below and the code). I want to:

  • A function that takes a matrix and a minimum height. Let's say 10 in this example.

    splitme <- function(matrix, minH){ ##Some code } 
  • Starting from the top to minH , start cutting whenever a new split appears. This is the first problem. How to detect new splits to get a height h .

  • In this particular h , how many clusters are there? Eject clusters

     mycl <- cutree(hr, h=x);#x is that found h count <- count(mycl)# Bad code 
  • Store each of the new matrices in the variable (s). This is another difficult task - dynamically creating x new matrices. Thus, it is possible that a function that accepts clusters does what needs to be done (comparisons) and returns a variable

  • Continue 3 and 4 until minH reaches

Picture

enter image description here

Code

 # Generate data set.seed(12345) desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4)) desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2)) desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3)) data <- cbind(desc.1, desc.2, desc.3) # Create dendrogram d <- dist(data) hc <- as.dendrogram(hclust(d)) # Function to color branches colbranches <- function(n, col) { a <- attributes(n) # Find the attributes of current node # Color edges with requested color attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2)) n # Don't forget to return the node! } # Color the first sub-branch of the first branch in red, # the second sub-branch in orange and the second branch in blue hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red") hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange") hc[[2]] = dendrapply(hc[[2]], colbranches, "blue") # Plot plot(hc) 
+6
source share
1 answer

I think that you essentially need a coefficient of coenthetic correlation of the dendrogram. He will tell you about the heights of all splitting points. From there you can easily walk through the tree. I made an attempt below and saved all the sub-matrices to the list of "sub-matrices". This is a nested list. The first level is submatrices from all splitting points. The second level is the submatrices from the splitting point. For example, if you want all submatrices from the first split point (gray and blue clusters), these were submatrices [[1]]. If you want the first submatrix (red cluster) from the submatrices [[1]], these should be the submatrices [[1]] [1].

 splitme <- function(data, minH){ ##Compute dist matrix and clustering dendrogram d <- dist(data) cl <- hclust(d) hc <- as.dendrogram(cl) ##Get the cophenetic correlation coefficient matrix (cccm) cccm <- round(cophenetic(hc), digits = 0) #Get the heights of spliting points (sps) sps <- sort(unique(cccm), decreasing = T) #This list store all the submatrices #The submatrices extract from the nth splitting points #(top splitting point being the 1st whereas bottom splitting point being the last) submatrices <- list() #Iterate/Walk the dendrogram i <- 2 #Starting from 2 as the 1st value will give you the entire dendrogram as a whole while(sps[i] > minH){ membership <- cutree(cl, h=sps[i]) #Cut the tree at splitting points lst <- list() #Create a list to store submatrices extract from a splitting point for(j in 1:max(membership)){ member <- which(membership == j) #Get the corresponding data entry to create the submatrices df <- data.frame() for(p in member){ df <- rbind(df, data[p, ]) colnames(df) <- colnames(data) dm <- dist(df) } lst <- append(lst, list(dm)) #Append all submatrices from a splitting point to lst } submatrices <- append(submatrices, list(lst)) #Append the lst to submatrices list i <- i + 1 } return(submatrices) } 
+4
source

Source: https://habr.com/ru/post/958442/


All Articles