Find the following siblings until someone takes advantage of beautifulsoup

The web page looks something like this:

<h2>section1</h2> <p>article</p> <p>article</p> <p>article</p> <h2>section2</h2> <p>article</p> <p>article</p> <p>article</p> 

How can I find each section with articles in them? That is, after finding h2, find nextsiblings

until the next h2.

If the webpage looked like: (usually it is)

 <div> <h2>section1</h2> <p>article</p> <p>article</p> <p>article</p> </div> <div> <h2>section2</h2> <p>article</p> <p>article</p> <p>article</p> </div> 

I can write codes such as:

 for section in soup.findAll('div'): ... for post in section.findAll('p') 

But what should I do with the first webpage if I want to get the same result?

+6
source share
1 answer

I think you can do something like this:

 for section in soup.findAll('h2'): nextNode = section while True: nextNode = nextNode.nextSibling try: tag_name = nextNode.name except AttributeError: tag_name = "" if tag_name == "p": print nextNode.string else: print "*****" break 

Given:

 <h2>section1</h2> <p>article1</p> <p>article2</p> <p>article3</p> <h2>section2</h2> <p>article4</p> <p>article5</p> <p>article6</p> 

Conclusion:

 article1 article2 article3 ***** article4 article5 article6 ***** 
+5
source

Source: https://habr.com/ru/post/921285/


All Articles