I want to clean the site using GAE and publish the results to Google Entity

Question

I want to clean the site using GAE and publish the results to Google Entity

I want to clear the the URL: https://www.xstreetsl.com/modules.php?searchSubmitImage_x=0&searchSubmitImage_y=0&SearchLocale=0&name=Marketplace&SearchKeyword=business&searchSubmitImage.x=0&searchSubmitImage.y=0&SearchLocale=0&SearchPriceMin=&SearchPriceMax=&SearchRatingMin=&SearchRatingMax=&sort= & dir = asc

Go to each of the links and extract various pieces of information, for example. permissions, primitives, etc., then publish the results to Entity in the Google engine.

I want to know how best to do this?

Chris

+1

python google-app-engine screen-scraping

cozza Mar 09 '10 at 3:22

source share

2

Python .

, scrapy. Twisted , .

BeautifulSoup Mechanize, "" .

BeautifulSoup Mechanize App Engine - httplib urllib, urlfetch . - . [ Nick Johnson ].

+3

jkp 09 . '10 3:34

hoju · Accepted Answer · 2010-03-09T05:45:18+0000

HTML Python html5lib, BeautifulSoup.

, HTML. Google App Engine, xpath, HTML. .

I want to clean the site using GAE and publish the results to Google Entity

More articles: