The best open source library or application for crawling and storing data on websites

Question

The best open source library or application for crawling and storing data on websites

I would like to know what is the best eopen-source library for crawling and analyzing websites. One example would be a crawler agency agent, where I would like to get information from several sites and fill them out on my own site. To do this, I need to crawl sites and retrieve real estate ads.

+3

web-crawler extraction data-mining text-extraction

gyurisc Apr 17 '09 at 7:26

source share

4 answers

PHP / cURL is a very powerful combination, especially if you want to use the results directly on a web page ...

+1

kal3v Jun 2 '09 at 14:13

source share

, , , . , . Beautifulsoup urllib2 .

lxml, . , , Google , , .

, Scrapy. .

+1

Bill Bell 01 . '09 14:57

Besides Scrapy, you should also see Parselets

0

Joseph Turian Oct 15 '09 at 10:16

source share

Eugene morozov · Accepted Answer · 2009-04-17T07:43:57+0000

I make a lot of clips using the excellent python urllib2 , mechanize, and BeautifulSoup packages .

I also suggest looking at lxml and Scrapy , although I am not using them at this time (I still plan on trying squeaks).

Perl also has great cleanup features.

The best open source library or application for crawling and storing data on websites

More articles: