I am new to python and I use BeautifulSoup to analyze the website and then to extract the data. I have the following code:
for line in raw_data:
d = {}
d['name'] = line.find('div', {'class':'torrentname'}).find('a')
print d['name']
<a href="/ubuntu-9-10-desktop-i386-t3144211.html">
<strong class="red">Ubuntu</strong> 9.10 desktop (i386)</a>
Normally I could extract "Ubuntu 9.10 desktop (i386)" by writing:
d['name'] = line.find('div', {'class':'torrentname'}).find('a').string
but due to strong html tags it returns None. Is there a way to extract strong tags and then use .string or is there a better way? I tried using the BeautifulSoup extract () function, but I could not get it to work.
Edit: I only realized that my solution does not work if there are two sets of strong tags, since the space between words is not taken into account. What is the way to fix this problem?
source
share