import urllib2 from lxml import etree url = 'some_url'
Get URL
test = urllib2.urlopen(url) page = test.read()
getting all html code inside table tag
tree = etree.HTML(page)
xpath selector
table = tree.xpath("xpath_here") res = etree.tostring(table)
res - the html code of the table this worked for me.
so that you can retrieve the contents of tags using xpath_text () and tags, including their contents, using tostring ()
div = tree.xpath("//div") div_res = etree.tostring(div)
text = tree.xpath_text("//content")
or text = tree.xpath ("// content / text ()")
div_3 = tree.xpath("//content") div_3_res = etree.tostring(div_3).strip('<content>').rstrip('</')
this last line using the strip method is not nice, but it just works
d3day Aug 19 '12 at 1:14 2012-08-19 01:14
source share