The web page looks something like this:
<h2>section1</h2> <p>article</p> <p>article</p> <p>article</p> <h2>section2</h2> <p>article</p> <p>article</p> <p>article</p>
How can I find each section with articles in them? That is, after finding h2, find nextsiblings
until the next h2.
If the webpage looked like: (usually it is)
<div> <h2>section1</h2> <p>article</p> <p>article</p> <p>article</p> </div> <div> <h2>section2</h2> <p>article</p> <p>article</p> <p>article</p> </div>
I can write codes such as:
for section in soup.findAll('div'): ... for post in section.findAll('p')
But what should I do with the first webpage if I want to get the same result?
source share