Building a phylogenous tree

I have a list of lists like this

matches = [[['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', 'Firmicutes'], ['class', 'Clostridia'], ['order', 'Clostridiales'], ['family', 'Lachnospiraceae'], ['genus', 'Lachnospira']], [['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', '"Proteobacteria"'], ['class', 'Gammaproteobacteria'], ['order', '"Vibrionales"'], ['family', 'Vibrionaceae'], ['genus', 'Catenococcus']], [['rootrank', 'Root'], ['domain', 'Archaea'], ['phylum', '"Euryarchaeota"'], ['class', '"Methanomicrobia"'], ['order', 'Methanomicrobiales'], ['family', 'Methanomicrobiaceae'], ['genus', 'Methanoplanus']]] 

And I want to build a phylogenetic tree from them. I wrote a node class similar to this (partially based on this code ):

 class Node(object): """Generic n-ary tree node object Children are additive; no provision for deleting them.""" def __init__(self, parent, category=None, name=None): self.parent = parent self.category = category self.name = name self.childList = [] if parent is None: self.birthOrder = 0 else: self.birthOrder = len(parent.childList) parent.childList.append(self) def fullPath(self): """Returns a list of children from root to self""" result = [] parent = self.parent kid = self while parent: result.insert(0, kid) parent, kid = parent.parent, parent return result def ID(self): return '{0}|{1}'.format(self.category, self.name) 

And then I try to build my tree like this:

 node = None for match in matches: for branch in match: category, name = branch node = Node(node, category, name) print [n.ID() for n in node.fullPath()] 

This works for the first match, but when I start with the second match, it is added at the end of the tree, rather than starting again at the top. How should I do it? I tried several search options for an identifier, but I cannot get it to work.

+4
source share
3 answers

The problem is that the node always the lowest node in the tree, and you always add to this node. You need to save the root node. Since ['rootrank', 'Root'] appears at the top of each of the lists, I would recommend pulling this out and using it as the root. So you can do something like:

 rootnode = Node(None, 'rootrank', 'Root') for match in matches: node = rootnode for branch in match: category, name = branch node = Node(node, category, name) print [n.ID() for n in node.fullPath()] 

This will make the matches list more readable and give the expected result.

+2
source

I highly recommend using a phylogenetics library like Dendropy .

"The standard way to write phylogenetic trees is in the Newick format (in brackets, for example ((A, B), C)). If you use Dendropy, reading this tree will be as easy as

 >>> import dendropy >>> tree1 = dendropy.Tree.get_from_string("((A,B),(C,D))", schema="newick") 

or to read from the stream

 >>> tree1 = dendropy.Tree(stream=open("mle.tre"), schema="newick") 

The library creator maintains a good tutorial .

+2
source

Do yourself a favor and do not reinvent the wheel. Python-graph (aka pygraph) does everything you ask here, and most of the things you ask next.

+1
source

Source: https://habr.com/ru/post/1495964/


All Articles