I would get rid of Regex and have a look at Beautiful Soup .
findAll(True)All tags found in your source are listed.
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(source)
allTags = soup.findAll(True)
[tag.name for tag in allTags ]
[u'em', u'label']
you just need to remove possible duplicates and view the tag lists.
This snippet verifies that all source tags are present in the target tags.
from BeautifulSoup import BeautifulSoup
def get_tags_set(source):
soup = BeautifulSoup(source)
all_tags = soup.findAll(True)
return set([tag.name for tag in all_tags])
def verify(tags_source_orig, tags_source_to_verify):
return tags_source_orig == set.intersection(tags_source_orig, tags_source_to_verify)
source= '<label>What\ your name</label><label>What\ your name</label><em>Hello</em>'
source_to_verify= '<em>Hello</em><label>What\ your name</label><label>What\ your name</label>'
print verify(get_tags_set(source),get_tags_set(source_to_verify))
source
share