I want to clear the the URL: https://www.xstreetsl.com/modules.php?searchSubmitImage_x=0&searchSubmitImage_y=0&SearchLocale=0&name=Marketplace&SearchKeyword=business&searchSubmitImage.x=0&searchSubmitImage.y=0&SearchLocale=0&SearchPriceMin=&SearchPriceMax=&SearchRatingMin=&SearchRatingMax=&sort= & dir = asc
Go to each of the links and extract various pieces of information, for example. permissions, primitives, etc., then publish the results to Entity in the Google engine.
I want to know how best to do this?
Chris
HTML Python html5lib, BeautifulSoup.
, HTML. Google App Engine, xpath, HTML. .
Python .
, scrapy. Twisted , .
BeautifulSoup Mechanize, "" .
BeautifulSoup Mechanize App Engine - httplib urllib, urlfetch . - . [ Nick Johnson ].
Source: https://habr.com/ru/post/1736040/More articles:Instances of a List of Parameterized Types Improving Uses of Generics and Linq - genericsPerforce file syntax syntax to NOT include subfolders - perforceHow to solve a "catastrophic failure" with a 32-bit COM component in SysWOW64 \ cscript or wscript - vbscriptFormCollection in VB.NET - .netSetting up a python screen scraper that can run on google app engine - pythonIs it possible to have conditional compilation for an ASP.NET comment? - asp.netWhat is the best way to profile a mobile web application? - androidAutomatic type conversion in Java? - javaCross-platform C ++ web server library - c ++Can I have nodereferences with thumbnails in Drupal? - drupalAll Articles