lxml.html can parse fragments:
from lxml import html s = """<tag1> <tag2> </tag2> </tag1> <tag1> <tag3/> </tag1>""" doc = html.fromstring(s) for thing in doc: print thing for other in thing: print other """ >>> <Element tag1 at 0x3411a80> <Element tag2 at 0x3428990> <Element tag1 at 0x3428930> <Element tag3 at 0x3411a80> >>> """
Provided this answer SO
And if there is more than one level of nesting:
def flatten(nested): """recusively flatten nested elements yields individual elements """ for thing in nested: yield thing for other in flatten(thing): yield other doc = html.fromstring(s) for thing in flatten(doc): print thing
Similarly, lxml.etree.HTML this. It adds html and body tags:
d = etree.HTML(s) for thing in d.iter(): print thing """ <Element html at 0x3233198> <Element body at 0x322fcb0> <Element tag1 at 0x3233260> <Element tag2 at 0x32332b0> <Element tag1 at 0x322fcb0> <Element tag3 at 0x3233148> """
source share