Lxml xpath returns an empty list

Question

Lxml xpath returns an empty list

<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" class="pc chrome win psc_dir-ltr psc_form-xlarge" dir="ltr" lang="en">
<title>Some Title</title>
</html>

if I run:

from lxml import etree
html = etree.parse('text.txt')
result = html.xpath('//title')
print(result)

I will get an empty list. I assume this has something to do with the namespace, but I can't figure out how to fix it.

+4

python xpath web-scraping lxml

jhh phi Jul 25 '17 at 5:33

source share

3 answers

You can also use the HTML parser:

from lxml import etree
parser = etree.HTMLParser() 
html = etree.parse('text.txt',parser)
result = html.xpath('//title')
print(result)

+1

PRMoureu Jul 25 '17 at 5:57

source share

You can do the following:

from lxml import etree
parser = etree.HTMLParser() 
html = etree.parse('text.txt',parser)
result = html.xpath('//title/text()')
print(result)

Conclusion:

['Some Title']

+1

saul Jul 25 '17 at 6:09

source share

James schinner · Accepted Answer · 2017-07-25T05:56:08+0000

Try creating a tree using the html parser. Also note that if it text.txtis a file, it will need to be read first.

with open('text.txt', 'r', encoding='utf8') as f:
    text_html = f.read()

like this:

from lxml import etree, html

def build_lxml_tree(_html):
    tree = html.fromstring(_html)
    tree = etree.ElementTree(tree)
    return tree

tree = build_lxml_tree(text_html)
result = tree.xpath('//title')
print(result)

Lxml xpath returns an empty list

More articles: