Access the next sibling <li> element using BeautifulSoup

I am completely new to web syntax with Python / BeautifulSoup. I have HTML that has (part) of code as follows:

<div id="pages">
    <ul>
        <li class="active"><a href="example.com">Example</a></li>
        <li><a href="example.com">Example</a></li>
        <li><a href="example1.com">Example 1</a></li>
        <li><a href="example2.com">Example 2</a></li>
    </ul>
</div>

I need to visit every link (basically every element <li>) until more tags appear <li>. Each time a link is clicked, the corresponding element <li>receives the class as "active". My code is:

from bs4 import BeautifulSoup
import urllib2
import re

landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)

pageList = soup.find("div", {"id": "pages"})

page = pageList.find("li", {"class": "active"})

This code gives me the first item <li>in the list. My logic is that I check if next_siblingnot None. If it is not None, I create an HTTP request for the hreftag attribute <a>in this sibling <li>. This will lead me to the next page and so on until there are no more pages.

, next_sibling page, . page.next_sibling.get("href") - ? , . - , ?

+4
2

find_next_sibling() , :

next_li_element = page.find_next_sibling("li")

next_li_element None, page li:

if next_li_element is None:
    # no more pages to go
+6

dir(page) ? , .find_next_sibling()?

from bs4 import BeautifulSoup
import urllib2
import re

landingPage = urllib2.urlopen('somepage.com').read()
soup = BeautifulSoup(landingPage)

pageList = soup.find("div", {"id": "pages"})

page = pageList.find("li", {"class": "active"})
sibling = page.find_next_sibling()
+1

Source: https://habr.com/ru/post/1626929/


All Articles