Get Xpath dynamically with ElementTree getpath ()

Question

Get Xpath dynamically with ElementTree getpath ()

I need to write a dynamic function that finds elements in the ATOM xml subtree, dynamically creating XPath for the element.

To do this, I wrote something like this:

tree = etree.parse(xmlFileUrl) e = etree.XPathEvaluator(tree, namespaces={'def':'http://www.w3.org/2005/Atom'}) entries = e('//def:entry') for entry in entries: mypath = tree.getpath(entry) + "/category" category = e(mypath)

There is no “category” in the above code because getpath () returns XPath without namespaces, while XPathEvaluator e () requires a namespace.

Although I know that I can use the path and provide the namespace in the XPathEvaluator call, I would like to know if getpath () is possible to return a “fully qualified” path using all namespaces, as this is convenient in some cases.

(This is a side question of my previous question: Python XpathEvaluator without namespace )

+4

python xpath lxml elementtree

puntofisso Oct 30 '12 at 9:47

source share

3 answers

Basically, using the standard xml.etree Python library, another visiting function is required. To achieve this, you can create a modified version of the iter method as follows:

 def etree_iter_path(node, tag=None, path='.'): if tag == "*": tag = None if tag is None or node.tag == tag: yield node, path for child in node: _child_path = '%s/%s' % (path, child.tag) for child, child_path in etree_iter_path(child, tag, path=_child_path): yield child, child_path

Then you can use this function to iterate the tree from the root of the node:

 from xml.etree import ElementTree xmldoc = ElementTree.parse(*path to xml file*) for elem, path in etree_iter_path(xmldoc.getroot()): print(elem, path)

+1

Davide brunato May 25 '16 at 14:51

source share

From the docs http://lxml.de/xpathxslt.html#the-xpath-class :

ElementTree objects have a getpath(element) method that returns an XPath structural absolute expression to find this element:

So the answer to your question is that getpath() will not return a "fully qualified" path, since otherwise there will be a function argument for this function, you will only be guaranteed that the returned xpath expression will find you an element.

You might be able to combine getpath and xpath (and the Xpath class) to do what you want.

0

jmh Oct 30 '12 at 12:55

source share

Simon sapin · Accepted Answer · 2012-11-24T15:03:43+0000

Instead of trying to build the full path from the root, you can evaluate the XPath expression with the record as the node base:

 tree = etree.parse(xmlFileUrl) nsmap = {'def':'http://www.w3.org/2005/Atom'} entries_expr = etree.XPath('//def:entry', namespaces=nsmap) category_expr = etree.XPath('category') for entry in entries_expr(tree): category = category_expr(entry)

If performance is not critical, you can simplify the code using the .xpath() method for elements, rather than pre-compiled expressions:

 tree = etree.parse(xmlFileUrl) nsmap = {'def':'http://www.w3.org/2005/Atom'} for entry in tree.xpath('//def:entry', namespaces=nsmap): category = entry.xpath('category')

Get Xpath dynamically with ElementTree getpath ()

More articles: