Tagging an html page using python 2.7 with beautifulsoup

I am trying to parse an html page with the given format:

<img class="outer" id="first" /> <div class="content" .../> <div class="content" .../> <div class="content" /> <img class="outer" id="second" /> <div class="content" .../> <div class="content" .../> <img class="outer" id="third" /> <div class="content" .../> <div class="content" .../> 

When repeating div tags, I want to find out if the current div tag is under the img tag with the identifier "first", "second" or "third". Is there any way to do this? I have a list of img blocks and div blocks:

 img_blocks = soup.find_all('img', attrs={'class':'outer'}) div_Blocks = soup.find_all('div', attrs={'class':'content'}) 
+4
source share
2 answers

Use .find_previous_sibling :

 >>> for divtag in div_Blocks: ... print divtag.find_previous_sibling('img') ... <img class="outer" id="first"/> <img class="outer" id="first"/> <img class="outer" id="first"/> <img class="outer" id="second"/> <img class="outer" id="second"/> <img class="outer" id="third"/> <img class="outer" id="third"/> 
+4
source

Not from your current starting point - you need to iterate over all tags, or at least tags of both types, if the tag is of type img, then save the identifier, if the class is a div, then the current saved identifier tells you which container you are in. NB You can use re in BS to filter only two types.

You are currently deleting the context by retrieving only the tags.

0
source

Source: https://habr.com/ru/post/1488926/


All Articles