• ...">

    Regex for getting matches within a group

    I do not know if the following is possible. Suppose I have the following text:

    <ul class="yes">
        <li><img src="whatever1"></li>
        <li><img src="whatever2"></li>
        <li><img src="whatever3"></li>
        <li><img src="whatever4"></li>
    </ul>
    <ul class="no">
        <li><img src="whatever5"></li>
        <li><img src="whatever6"></li>
        <li><img src="whatever7"></li>
        <li><img src="whatever8"></li>
    </ul>
    

    I would like to map each img src inside ul to the class yes. I want one regex to return me:

    whatever1
    whatever2
    whatever3
    whatever4
    

    How can I append two regular expressions like these in one regular expression?

    <ul class="yes">(.+?)<\/ul>
    <img src="(whatever.+?)">
    
    +4
    source share
    1 answer

    It is known that Regex is difficult to use for parsing XML-like materials. Better skip the idea and collapse using your own HTML parser, for example using BeautifulSoup4 :

    import bs4
    
    html = """
    <ul class="yes">
        <li><img src="whatever1"></li>
        <li><img src="whatever2"></li>
        <li><img src="whatever3"></li>
        <li><img src="whatever4"></li>
    </ul>
    <ul class="no">
        <li><img src="whatever5"></li>
        <li><img src="whatever6"></li>
        <li><img src="whatever7"></li>
        <li><img src="whatever8"></li>
    </ul>
    """
    
    soup = bs4.BeautifulSoup(html)
    
    def match_imgs(tag):
        return tag.name == 'img' \
            and tag.parent.parent.name == 'ul' \
            and tag.parent.parent['class'] == ['yes']
    
    imgs = soup.find_all(match_imgs)
    print(imgs)
    
    whatevers = [i['src'] for i in imgs]
    print(whatevers)
    

    Productivity:

    [<img src="whatever1"/>, <img src="whatever2"/>, <img src="whatever3"/>,
    <img src="whatever4"/>]
    
    [u'whatever1', u'whatever2', u'whatever3', u'whatever4']
    
    +1
    source

    Source: https://habr.com/ru/post/1548391/


    All Articles