Access the next sibling <li> element using BeautifulSoup
I am completely new to web syntax with Python / BeautifulSoup. I have HTML that has (part) of code as follows:
<div id="pages">
<ul>
<li class="active"><a href="example.com">Example</a></li>
<li><a href="example.com">Example</a></li>
<li><a href="example1.com">Example 1</a></li>
<li><a href="example2.com">Example 2</a></li>
</ul>
</div>
I need to visit every link (basically every element <li>) until more tags appear <li>. Each time a link is clicked, the corresponding element <li>receives the class as "active". My code is:
from bs4 import BeautifulSoup
import urllib2
import re
landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)
pageList = soup.find("div", {"id": "pages"})
page = pageList.find("li", {"class": "active"})
This code gives me the first item <li>in the list. My logic is that I check if next_siblingnot None. If it is not None, I create an HTTP request for the hreftag attribute <a>in this sibling <li>. This will lead me to the next page and so on until there are no more pages.
, next_sibling page, . page.next_sibling.get("href") - ? , . - , ?
+4
2
dir(page) ? , .find_next_sibling()?
from bs4 import BeautifulSoup
import urllib2
import re
landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)
pageList = soup.find("div", {"id": "pages"})
page = pageList.find("li", {"class": "active"})
sibling = page.find_next_sibling()
+1