The best open source library or application for crawling and storing data on websites

I would like to know what is the best eopen-source library for crawling and analyzing websites. One example would be a crawler agency agent, where I would like to get information from several sites and fill them out on my own site. To do this, I need to crawl sites and retrieve real estate ads.

+3
source share
4 answers

I make a lot of clips using the excellent python urllib2 , mechanize, and BeautifulSoup packages .

I also suggest looking at lxml and Scrapy , although I am not using them at this time (I still plan on trying squeaks).

Perl also has great cleanup features.

+8
source

PHP / cURL is a very powerful combination, especially if you want to use the results directly on a web page ...

+1
source

, , , . , . Beautifulsoup urllib2 .

lxml, . , , Google , , .

, Scrapy. .

+1

Besides Scrapy, you should also see Parselets

0
source

Source: https://habr.com/ru/post/1706573/


All Articles