Python Scrapy - populate start_urls from mysql

I am trying to populate start_url with SELECT from MYSQL table using spider.py . When I run "scrapy runpider spider.py", I do not get the output, it just finished without errors.

I checked the SELECT query in a python script, and start_url populated the records from the MYSQL table.

spider.py

from scrapy.spider import BaseSpider from scrapy.selector import Selector import MySQLdb class ProductsSpider(BaseSpider): name = "Products" allowed_domains = ["test.com"] start_urls = [] def parse(self, response): print self.start_urls def populate_start_urls(self, url): conn = MySQLdb.connect( user='user', passwd='password', db='scrapy', host='localhost', charset="utf8", use_unicode=True ) cursor = conn.cursor() cursor.execute( 'SELECT url FROM links;' ) rows = cursor.fetchall() for row in rows: start_urls.append(row[0]) conn.close() 
+6
source share
2 answers

A better approach is to override the start_requests method.

This can query your database, like populate_start_urls , and return a sequence of Request objects.

You just need to rename your populate_start_urls method to start_requests and change the following lines:

 for row in rows: yield self.make_requests_from_url(row[0]) 
+13
source

Write a check in __init__ :

 def __init__(self): super(ProductsSpider,self).__init__() self.start_urls = get_start_urls() 

Assuming get_start_urls() returns the urls.

+4
source

Source: https://habr.com/ru/post/958553/


All Articles