Using SoupStrainer for spot analysis

I am trying to parse a list of video game titles from a trading site. however, since the list of items is stored inside the tag.

This section of the documentation may explain how to parse only part of the document, but I cannot process it. my code is:

from BeautifulSoup import BeautifulSoup
import urllib
import re

url = "Some Shopping Site"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
for a in soup.findAll('a',{'title':re.compile('.+') }):
    print a.string

currently prints a line inside any tag that has an empty title link. but it also attracts items in the sidebar that are “special”. if I can only take the list of div products, I will kill 2 birds with one stone.

Many thanks.

+3
source share
2 answers

, , , id = products, product_list

, - .

from BeautifulSoup import BeautifulSoup, SoupStrainer
import urllib
import re


start = time.clock()
url = "http://someplace.com"
html = urllib.urlopen(url).read()
product = SoupStrainer('div',{'id': 'products_list'})
soup = BeautifulSoup(html,parseOnlyThese=product)
for a in soup.findAll('a',{'title':re.compile('.+') }):
      print a.string
+9

div, a :

product = soup.find('div',{'id': 'products'})
for a in product.findAll('a',{'title': re.compile('.+') }):
   print a.string
0

Source: https://habr.com/ru/post/1771088/


All Articles