I am using lxml xpath to extract parts of a webpage. I am trying to get the contents of a tag <font>that includes its own html tags. If i use
//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]
I get the right number of nodes, but they are returned as lxml ( <Element font at 0x101fe5eb0>) objects .
If i use
//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/text()
I get exactly what I want, except that I do not have the HTML code contained in the nodes <font>.
If i use
//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/node()
if you get a mixture of text and lxml elements! (e.g. something something <Element a at 0x102ac2140> something)
Is it possible to use a pure XPath query to retrieve the contents of nodes, <font>or even make lxml return a string of content from a method .xpath(), rather than an lxml object?
, XPath, .
... something something <a href="url">inside</a> something - ...
<font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>