How to get full node content using xpath & lxml?

I am using lxml xpath to extract parts of a webpage. I am trying to get the contents of a tag <font>that includes its own html tags. If i use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]

I get the right number of nodes, but they are returned as lxml ( <Element font at 0x101fe5eb0>) objects .

If i use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/text()

I get exactly what I want, except that I do not have the HTML code contained in the nodes <font>.

If i use

//td[@valign="top"]/p[1]/font[@face="verdana" and @color="#ffffff" and @size="2"]/node()

if you get a mixture of text and lxml elements! (e.g. something something <Element a at 0x102ac2140> something)

Is it possible to use a pure XPath query to retrieve the contents of nodes, <font>or even make lxml return a string of content from a method .xpath(), rather than an lxml object?

, XPath, .

... something something <a href="url">inside</a> something - ...

<font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>
+3
2

, - , ?

import lxml.etree as le
import cStringIO
content='''\
<font face="verdana" color="#ffffff" size="2"><a href="url">inside</a> something</font>
'''
doc=le.parse(cStringIO.StringIO(content))

xpath='//font[@face="verdana" and @color="#ffffff" and @size="2"]/child::*'
x=doc.xpath(xpath)
print(map(le.tostring,x))
# ['<a href="url">inside</a> something']
+2

XPath , <font> lxml .xpath(), lxml?

, XPath, .

... - <a href="url">inside</a> something - ...

<font face="verdana" color="#ffffff" size="2"><a

href= "url" > -

: .

XPath "",

, XPath.

node, outerXML - ( lxml).

: lxml tostring() outerXML .

+2

Source: https://habr.com/ru/post/1773413/


All Articles