I use Beautiful Soup to get hyperlinks in the body of web pages. Here is the code I'm using
import urllib2
from bs4 import BeautifulSoup
url = 'http://www.1914-1918.net/swb.htm'
element = 'body'
request = urllib2.Request(url)
page = urllib2.urlopen(request).read()
pageSoup = BeautifulSoup(page)
for elementSoup in pageSoup.find_all(element):
for linkSoup in elementSoup.find_all('a'):
print linkSoup['href']
I got the AttributeError attribute when I tried to find hyperlinks for the swb.htm page.
AttributeError: object "NoneType" does not have attribute "next_element"
I am sure that there is a body element and a pair of "a" elements under the body element. But strangely this works well for other pages (e.g. http://www.1914-1918.net/1div.htm ).
This problem haunts me for several days. Can someone point out what I did wrong.
Screenshot

source
share