I want to clean the site using GAE and publish the results to Google Entity

I want to clear the the URL: https://www.xstreetsl.com/modules.php?searchSubmitImage_x=0&searchSubmitImage_y=0&SearchLocale=0&name=Marketplace&SearchKeyword=business&searchSubmitImage.x=0&searchSubmitImage.y=0&SearchLocale=0&SearchPriceMin=&SearchPriceMax=&SearchRatingMin=&SearchRatingMax=&sort= & dir = asc

Go to each of the links and extract various pieces of information, for example. permissions, primitives, etc., then publish the results to Entity in the Google engine.

I want to know how best to do this?

Chris

+1
source share
2

HTML Python html5lib, BeautifulSoup.

, HTML. Google App Engine, xpath, HTML. .

+3

Python .

, scrapy. Twisted , .

BeautifulSoup Mechanize, "" .

BeautifulSoup Mechanize App Engine - httplib urllib, urlfetch . - . [ Nick Johnson ].

+3

Source: https://habr.com/ru/post/1736040/


All Articles