You are not saying in what context you are doing this (XSLT?), But here is the Python / lxml suggestion:
from lxml import etree XML = """ <root> <td> <a href="#">text_to_capture</a> </td> <td> <b><a href="#">text_to_capture</a></b> </td> <td> text_to_capture </td> </root>""" doc = etree.fromstring(XML) expr = "//td//text()" texts = doc.xpath(expr) print texts # includes whitespace-only nodes for t in texts: if t.strip(): print t.strip()
Output:
['\n ', 'text_to_capture', '\n ', '\n ', 'text_to_capture', '\n ', '\n text_to_capture\n '] text_to_capture text_to_capture text_to_capture
This solution selects all the text in <td> regardless of the names of the <td> child elements.
source share