How to collapse branches in a phylogenetic tree by a label in their nodes or leaves?

I built a phylogenetic tree for a family of proteins that can be divided into different groups, classifying each by receptor type or response type. Nodes in the tree are marked as receptor type.

In the phylogenetic tree, I see that proteins belonging to the same groups or receptor type are grouped together in the same branches. Therefore, I would like to collapse these branches that have common tags, grouping them according to a specific list of keywords.

The command will look something like this:

./collapse_tree_by_label -f phylogenetic_tree.newick -l list_of_labels_to_collapse.txt -o collapsed_tree.eps (or pdf)

My list_of_labels_to_collapse.txt will look like this: In C D

My new tree will look like this: (A_1: 0.05, A_2: 0.03, A_3: 0.2, A_4: 0.1): 0.9, (((B_1: 0.05, B_2: 0.02, B_3: 0.04): 0.6, (C_1: 0.6, C_2: 0.08): 0.7): 0.5, (D_1: 0.3, D_2: 0.4, D_3: 0.5, D_4 : 0.7, D_5: 0.4): 1.2)

The output image without smoothing looks like this: http://i.stack.imgur.com/pHkoQ.png

Collapsing the output image should be like this (collapsed_tree.eps): http://i.stack.imgur.com/TLXd0.png

The width of the triangles should represent the length of the branch, and the top of the triangles should represent the number of nodes in the branches.

I played with the β€œape” package in R. I managed to build a phylogenetic tree, but I still can’t figure out how to collapse the branches by keywords in the labels:

require("ape") 

This will load the tree:

 cat("((A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n") tree.test <- read.tree("ex.tre") 

There should be a code to reset

This will build a tree:

 plot(tree.test) 
+5
source share
4 answers

Your tree stored in R already has tips stored as polytomies. It is simply a matter of constructing a tree with triangles representing polytomies.

There is no function in ape for this, which I know about, but if you are a little in touch with the charting function, you can disable it

 # Step 1: make edges for descendent nodes invisible in plot: groups <- c("A", "B", "C", "D") group_edges <- numeric(0) for(group in groups){ group_edges <- c(group_edges,getMRCA(tree.test,tree.test$tip.label[grepl(group, tree.test$tip.label)])) } edge.width <- rep(1, nrow(tree.test$edge)) edge.width[tree.test$edge[,1] %in% group_edges ] <- 0 # Step 2: plot the tree with the hidden edges plot(tree.test, show.tip.label = F, edge.width = edge.width) # Step 3: add triangles add_polytomy_triangle <- function(phy, group){ root <- length(phy$tip.label)+1 group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)] group_nodes <- which(phy$tip.label %in% group_node_labels) group_mrca <- getMRCA(phy,group_nodes) tip_coord1 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[1]) tip_coord2 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[length(group_nodes)]) node_coord <- c(dist.nodes(phy)[root, group_mrca], mean(c(tip_coord1[2], tip_coord2[2]))) xcoords <- c(tip_coord1[1], tip_coord2[1], node_coord[1]) ycoords <- c(tip_coord1[2], tip_coord2[2], node_coord[2]) polygon(xcoords, ycoords) } 

Then you just need to scroll through the groups to add triangles

 for(group in groups){ add_polytomy_triangle(tree.test, group) } 
+4
source

I think the script is finally doing what I wanted. From the answer provided by @CactusWoman, I slightly modified the code so that the script tried to find the MRCA that represents the largest branch that matches my search pattern. This solved the problem of not merging non-polytomic branches or destroying the entire tree, because one node match was mistakenly outside the correct branch.

In addition, I included a parameter that represents the limit for the fill factor of the picture in this branch, so we can select and collapse / group branches that have at least 90% of its tips matching the search pattern, for example,

 library(geiger) library(phylobase) library(ape) #functions find_best_mrca <- function(phy, group, threshold){ group_matches <- phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)] group_mrca <- getMRCA(phy,phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)]) group_leaves <- tips(phy, group_mrca) match_ratio <- length(group_matches)/length(group_leaves) if( match_ratio < threshold){ #start searching for children nodes that have more than 95% of descendants matching to the search pattern mrca_children <- descendants(as(phy,"phylo4"), group_mrca, type="all") i <- 1 new_ratios <- NULL nleaves <- NULL names(mrca_children) <- NULL for(new_mrca in mrca_children){ child_leaves <- tips(tree.test, new_mrca) child_matches <- grep(group, child_leaves, ignore.case=TRUE) new_ratios[i] <- length(child_matches)/length(child_leaves) nleaves[i] <- length(tips(phy, new_mrca)) i <- i+1 } match_result <- data.frame(mrca_children, new_ratios, nleaves) match_result_sorted <- match_result[order(-match_result$nleaves,match_result$new_ratios),] found <- numeric(0); print(match_result_sorted) for(line in 1:nrow(match_result_sorted)){ if(match_result_sorted$ new_ratios[line]>=threshold){ return(match_result_sorted$mrca_children[line]) found <- 1 } } if(found==0){return(found)} }else{return(group_mrca)} } add_triangle <- function(phy, group,phylo_plot){ group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)] group_mrca <- getMRCA(phy,group_node_labels) group_nodes <- descendants(as(tree.test,"phylo4"), group_mrca, type="tips") names(group_nodes) <- NULL x<-phylo_plot$xx y<-phylo_plot$yy x1 <- max(x[group_nodes]) x2 <-max(x[group_nodes]) x3 <- x[group_mrca] y1 <- min(y[group_nodes]) y2 <- max(y[group_nodes]) y3 <- y[group_mrca] xcoords <- c(x1,x2,x3) ycoords <- c(y1,y2,y3) polygon(xcoords, ycoords) return(c(x2,y3)) } #main cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n") tree.test <- read.tree("ex.tre") # Step 1: Find the best MRCA that matches to the keywords or search patten groups <- c("A", "B|C", "D") group_labels <- groups group_edges <- numeric(0) edge.width <- rep(1, nrow(tree.test$edge)) count <- 1 for(group in groups){ best_mrca <- find_best_mrca(tree.test, group, 0.90) group_leaves <- tips(tree.test, best_mrca) groups[count] <- paste(group_leaves, collapse="|") group_edges <- c(group_edges,best_mrca) #Step2: Remove the edges of the branches that will be collapsed, so they become invisible edge.width[tree.test$edge[,1] %in% c(group_edges[count],descendants(as(tree.test,"phylo4"), group_edges[count], type="all")) ] <- 0 count = count +1 } #Step 3: plot the tree hiding the branches that will be collapsed/grouped last_plot.phylo <- plot(tree.test, show.tip.label = F, edge.width = edge.width) #And save a copy of the plot so we can extract the xy coordinates of the nodes #To get the x & y coordinates of a plotted tree created using plot.phylo #or plotTree, we can steal from inside tiplabels: last_phylo_plot<-get("last_plot.phylo",envir=.PlotPhyloEnv) #Step 4: Add triangles and labels to the collapsed nodes for(i in 1:length(groups)){ text_coords <- add_triangle(tree.test, groups[i],last_phylo_plot) text(text_coords[1],text_coords[2],labels=group_labels[i], pos=4) } 
+1
source

I have also searched for this tool for centuries, not so much for collapsing categorical groups as for collapsing internal nodes based on the numerical value of support.

The di2multi function in the ape package can collapse nodes in polytomies, but currently it can only do this by the threshold of the branch length. The following is a rough adaptation, which allows you to collapse instead using the node support threshold (default threshold = 0.5).

Use at your own risk, but it works for me on my root Bayesian tree.

 di2multi4node <- function (phy, tol = 0.5) # Adapted di2multi function from the ape package to plot polytomies # based on numeric node support values # (di2multi does this based on edge lengths) # Needs adjustment for unrooted trees as currently skips the first edge { if (is.null(phy$edge.length)) stop("the tree has no branch length") if (is.na(as.numeric(phy$node.label[2]))) stop("node labels can't be converted to numeric values") if (is.null(phy$node.label)) stop("the tree has no node labels") ind <- which(phy$edge[, 2] > length(phy$tip.label))[as.numeric(phy$node.label[2:length(phy$node.label)]) < tol] n <- length(ind) if (!n) return(phy) foo <- function(ancestor, des2del) { wh <- which(phy$edge[, 1] == des2del) for (k in wh) { if (phy$edge[k, 2] %in% node2del) foo(ancestor, phy$edge[k, 2]) else phy$edge[k, 1] <<- ancestor } } node2del <- phy$edge[ind, 2] anc <- phy$edge[ind, 1] for (i in 1:n) { if (anc[i] %in% node2del) next foo(anc[i], node2del[i]) } phy$edge <- phy$edge[-ind, ] phy$edge.length <- phy$edge.length[-ind] phy$Nnode <- phy$Nnode - n sel <- phy$edge > min(node2del) for (i in which(sel)) phy$edge[i] <- phy$edge[i] - sum(node2del < phy$edge[i]) if (!is.null(phy$node.label)) phy$node.label <- phy$node.label[-(node2del - length(phy$tip.label))] phy } 
+1
source

This is my answer based on the function phytools::phylo.toBackbone , see http://blog.phytools.org/2013/09/even-more-on-plotting-subtrees-as.html and http: // blog. phytools.org/2013/10/finding-edge-lengths-of-all-terminal.html . First load the function at the end of the code.

 library(ape) library(phytools) #phylo.toBackbone library(phangorn) cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n") phy <- read.tree("ex.tre") groups <- c("A", "B|C", "D") backboneoftree<-makebackbone(groups,phy) # tip.label clade.label N depth # 1 A_1 A 10 0.2481818 # 2 B_1 B|C 6 0.9400000 # 3 D_1 D 5 0.4600000 par(mfrow=c(2,2), mar=c(0,2,2,0) ) plot(phy, main="Complete tree" ) plot(backboneoftree) makebackbone<-function(groupings,phy){ listofspecies<-phy$tip.label listtopreserve<-list() lengthofclades<-list() meandistnode<-list() newedgelengths<-list() for (i in 1:length(groupings)){ groupings<-groups bestmrca<-getMRCA(phy,grep(groupings[i], phy$tip.label) ) mrcatips<-phy$tip.label[unlist(phangorn::Descendants(phy,bestmrca, type="tips") )] listtopreserve[i]<- mrcatips[1] meandistnode[i]<- mean(dist.nodes(phy)[unlist(lapply(mrcatips, function(x) grep(x, phy$tip.label) ) ),bestmrca] ) lengthofclades[i]<-length(mrcatips) provtree<-drop.tip(phy,mrcatips, trim.internal=F, subtree = T) n3<-length(provtree$tip.label) newedgelengths[i]<-setNames(provtree$edge.length[sapply(1:n3,function(x,y) which(y==x),y=provtree$edge[,2])], provtree$tip.label)[provtree$tip.label[grep("tips",provtree$tip.label)] ] } newtree<-drop.tip(phy,setdiff(listofspecies,unlist(listtopreserve)), trim.internal = T) n<-length(newtree$tip.label) newtree$edge.length[sapply(1:n,function(x,y) which(y==x),y=newtree$edge[,2])]<-unlist(newedgelengths)+unlist(meandistnode) trans<-data.frame(tip.label=newtree$tip.label,clade.label=groupings, N=unlist(lengthofclades), depth=unlist(meandistnode) ) rownames(trans)<-NULL print(trans) backboneoftree<-phytools::phylo.toBackbone(newtree,trans) return(backboneoftree) } 

enter image description here

EDIT: I have not tried this, but it may be a different answer: "Script and a function to convert the branches of the tip of the tree, that is, thickness or triangles, with the width of both correlations with certain parameters (for example, the number of types of treasure) (tip.branches.R ) " http://www.sysbot.biologie.uni-muenchen.de/en/people/cusimano/use_r.html http://www.sysbot.biologie.uni-muenchen.de/en/people/cusimano/tip .branches.R

+1
source

Source: https://habr.com/ru/post/1238880/


All Articles