If you are parsing HTML with parsing, you are most likely to be mistaken, unless you are writing a one-time script designed for fixed and protected content. If it should work on any HTML input, how will you handle something like <a title='growth > 8%' href='#something'> ?
In any case, the following works for me:
>>> import re >>> re.split('(<[^>]*>)', '<body><table><tr><td>')[1::2] ['<body>', '<table>', '<tr>', '<td>']
gb. Oct 23 '11 at 2:54
source share