Python web crawler with MySQL database

Question

Python web crawler with MySQL database

I want to create or find an open source search robot (spider / bot) written in Python. It must find and track links, collect meta tags and meta descriptions, the name of the web pages and the URL of the web page, and put all the data in the MySQL database.

Does anyone know of any open source scripts that can help me? Also, if someone can give me some guidance on what I should do, then they are more than welcome.

+6

python sql mysql web-crawler web-scraping

Cosmo posmo Aug 10 '11 at 20:18

source share

3 answers

I suggest you use Scrapy , which is a powerful cleaning environment based on Twisted and lxml . It is particularly well suited for the tasks you want to perform, it uses regexp rules to track links, and allows you to use regular expressions or XPath expressions to extract data from html. It also provides what they call “pipelines” to flush data to the position you need.

Scrapy does not provide a built-in MySQL pipeline, but someone wrote here here from which you can create your own.

+4

Mattoufoutu Aug 10 '11 at 20:29

source share

Scrappy is a web crawl and scrambling that you can extend to insert the selected data into the database.

This is similar to the inverse structure of Django.

+3

hannson Aug 10 '11 at 20:29

source share

Lynob · Accepted Answer · 2011-08-10T20:29:45+0000

yes I know,

libraries

https://github.com/djay/transmogrify.webcrawler

http://code.google.com/p/harvestman-crawler/

http://code.activestate.com/pypm/orchid/

open source scanner

http://scrapy.org/

textbooks

http://www.example-code.com/python/pythonspider.asp

PS I don’t know if they use mysql, because usually python uses sqlit or postgre sql, so if you want you can use the libraries that I gave you and import the python-mysql module and do this: D

http://sourceforge.net/projects/mysql-python/

Python web crawler with MySQL database

More articles: