I would like to know what is the best eopen-source library for crawling and analyzing websites. One example would be a crawler agency agent, where I would like to get information from several sites and fill them out on my own site. To do this, I need to crawl sites and retrieve real estate ads.
I make a lot of clips using the excellent python urllib2 , mechanize, and BeautifulSoup packages .
I also suggest looking at lxml and Scrapy , although I am not using them at this time (I still plan on trying squeaks).
Perl also has great cleanup features.
PHP / cURL is a very powerful combination, especially if you want to use the results directly on a web page ...
, , , . , . Beautifulsoup urllib2 .
lxml, . , , Google , , .
, Scrapy. .
Besides Scrapy, you should also see Parselets
Source: https://habr.com/ru/post/1706573/More articles:The problem with the connection WM_NOTIFY and superclasses in Win32 - c ++How to use onLoading event in grails remoteFunction - ajaxSQL Server Indexes - sqlReassign Macro to Command button when copying file - excel-vbaΠΡΡΠΈΡΠ»ΠΈΡΡ Π²ΡΠ΅ ΠΊΠΎΠΌΠ±ΠΈΠ½Π°ΡΠΈΠΈ ΡΠ΅ΡΠΈΠΈ - javascript"Masking" external URL links using RewriteProxy - url-rewritingPerformance issues using multiple generator layers in Python? - performanceHow to get google appengine objects using their numeric identifier? - pythonJava Lombardi API - javaΠ ΡΠ΅ΠΌ ΡΠ°Π·Π½ΠΈΡΠ° ΠΌΠ΅ΠΆΠ΄Ρ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ΠΌ Π΄ΠΈΡΠΏΠ΅ΡΡΠ΅ΡΠ° ΠΈ ΡΠ΅ΡΠ²ΠΈΡΠΎΠΌ Π΄Π»Ρ ΡΠ°Π±ΠΎΡΡ Ρ ΡΠ°Π±Π»ΠΎΠ½ΠΎΠΌ ΠΏΡΠΎΠ΅ΠΊΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ? - java-eeAll Articles