Using SoupStrainer for spot analysis

Question

Using SoupStrainer for spot analysis

I am trying to parse a list of video game titles from a trading site. however, since the list of items is stored inside the tag.

This section of the documentation may explain how to parse only part of the document, but I cannot process it. my code is:

from BeautifulSoup import BeautifulSoup
import urllib
import re

url = "Some Shopping Site"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
for a in soup.findAll('a',{'title':re.compile('.+') }):
    print a.string

currently prints a line inside any tag that has an empty title link. but it also attracts items in the sidebar that are “special”. if I can only take the list of div products, I will kill 2 birds with one stone.

Many thanks.

+3

python beautifulsoup scrape

Scraper Oct 23 '10 at 16:34

source share

2 answers

Scraper · Answer 1 · 2010-10-24T03:58:15+0000

, , , id = products, product_list

, - .

from BeautifulSoup import BeautifulSoup, SoupStrainer
import urllib
import re


start = time.clock()
url = "http://someplace.com"
html = urllib.urlopen(url).read()
product = SoupStrainer('div',{'id': 'products_list'})
soup = BeautifulSoup(html,parseOnlyThese=product)
for a in soup.findAll('a',{'title':re.compile('.+') }):
      print a.string

dusan · Answer 2 · 2010-10-23T17:58:04+0000

div, a :

product = soup.find('div',{'id': 'products'})
for a in product.findAll('a',{'title': re.compile('.+') }):
   print a.string

Using SoupStrainer for spot analysis

More articles: