Beautiful soup: access to <li> elements from <ul> without id

Question

Beautiful soup: access to <li> elements from <ul> without id

I am trying to clear people who have birthdays from this Wikipedia page.

Here is the existing code:

hdr = {'User-Agent': 'Mozilla/5.0'}
site = "http://en.wikipedia.org/wiki/"+"january"+"_"+"1"
req = urllib2.Request(site,headers=hdr)    
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

print soup

It all works fine, and I get the whole HTML page, but I need specific data, and I don't know how to access it using Beautiful Soup without using id. The tag <ul>has no identifier and no tags <li>. Also, I can't just request each tag <li>because there are other lists on the page. Is there a specific way to call this list? (I cannot just use the fix for this one page because I plan to iterate over all dates and get every birthday for my birthday, and I cannot guarantee that each page has the same layout as this one).

+1

python html-parsing web-scraping beautifulsoup

Alex Chumbley Jul 16 '13 at 17:42

source share

2

, span Births, ( ul) li. requests ( ):

from bs4 import BeautifulSoup as Soup, Tag

import requests


response = requests.get("http://en.wikipedia.org/wiki/January_1")
soup = Soup(response.content)

births_span = soup.find("span", {"id": "Births"})
births_ul = births_span.parent.find_next_sibling()

for item in births_ul.findAll('li'):
    if isinstance(item, Tag):
        print item.text

:

871 – Zwentibold, Frankish son of Arnulf of Carinthia (d. 900)
1431 – Pope Alexander VI (d. 1503)
1449 – Lorenzo de' Medici, Italian politician (d. 1492)
1467 – Sigismund I the Old, Polish king (d. 1548)
1484 – Huldrych Zwingli, Swiss pastor and theologian (d. 1531)
1511 – Henry, Duke of Cornwall (d. 1511)
1516 – Margaret Leijonhufvud, Swedish wife of Gustav I of Sweden (d. 1551)
...

, .

+6

alecxe 16 . '13 17:51

Blender · Accepted Answer · 2013-07-16T17:46:10+0000

:

section = soup.find('span', id='Births').parent

:

births = section.find_next('ul').find_all('li')

Beautiful soup: access to <li> elements from <ul> without id

More articles: