Access the next sibling <li> element using BeautifulSoup

Question

Access the next sibling <li> element using BeautifulSoup

I am completely new to web syntax with Python / BeautifulSoup. I have HTML that has (part) of code as follows:

<div id="pages">
    <ul>
        <li class="active"><a href="example.com">Example</a></li>
        <li><a href="example.com">Example</a></li>
        <li><a href="example1.com">Example 1</a></li>
        <li><a href="example2.com">Example 2</a></li>
    </ul>
</div>

I need to visit every link (basically every element <li>) until more tags appear <li>. Each time a link is clicked, the corresponding element <li>receives the class as "active". My code is:

from bs4 import BeautifulSoup
import urllib2
import re

landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)

pageList = soup.find("div", {"id": "pages"})

page = pageList.find("li", {"class": "active"})

This code gives me the first item <li>in the list. My logic is that I check if next_siblingnot None. If it is not None, I create an HTTP request for the hreftag attribute <a>in this sibling <li>. This will lead me to the next page and so on until there are no more pages.

, next_sibling page, . page.next_sibling.get("href") - ? , . - , ?

+4

python html beautifulsoup

user3033194 01 . '16 22:02

2

dir(page) ? , .find_next_sibling()?

from bs4 import BeautifulSoup
import urllib2
import re

landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)

pageList = soup.find("div", {"id": "pages"})

page = pageList.find("li", {"class": "active"})
sibling = page.find_next_sibling()

+1

L3viathan 01 . '16 22:08

alecxe · Accepted Answer · 2016-02-01T22:08:35+0000

find_next_sibling() , :

next_li_element = page.find_next_sibling("li")

next_li_element None, page li:

if next_li_element is None:
    # no more pages to go

Access the next sibling <li> element using BeautifulSoup

More articles: