Match HTML tags on two lines using regex in Python

I want to verify that the HTML tags present in the source string are also present in the target string.

For instance:

>> source = '<em>Hello</em><label>What your name</label>'
>> verify_target(’<em>Hi</em><label>My name is Jim</label>')
True
>> verify_target('<label>My name is Jim</label><em>Hi</em>')
True
>> verify_target('<em>Hi<label>My name is Jim</label></em>')
False
+3
source share
2 answers

I would get rid of Regex and have a look at Beautiful Soup .
findAll(True)All tags found in your source are listed.

from BeautifulSoup import BeautifulSoup 
soup = BeautifulSoup(source)
allTags = soup.findAll(True)
[tag.name for tag in allTags ]
[u'em', u'label']

you just need to remove possible duplicates and view the tag lists.

This snippet verifies that all source tags are present in the target tags.

from BeautifulSoup import BeautifulSoup
def get_tags_set(source):
    soup = BeautifulSoup(source)
    all_tags = soup.findAll(True)
    return set([tag.name for tag in all_tags])

def verify(tags_source_orig, tags_source_to_verify):
    return tags_source_orig == set.intersection(tags_source_orig, tags_source_to_verify)

source= '<label>What\ your name</label><label>What\ your name</label><em>Hello</em>'
source_to_verify= '<em>Hello</em><label>What\ your name</label><label>What\ your name</label>'
print verify(get_tags_set(source),get_tags_set(source_to_verify))
+4
source

, , , html , , .

HTMLParser, . , .

+1

Source: https://habr.com/ru/post/1741892/


All Articles