Closing a database connection to a pipeline and middleware in Scrapy

Question

Closing a database connection to a pipeline and middleware in Scrapy

I have a Scrapy project that uses specialized middleware and a custom pipeline to check and store records in DB Postgres. Middleware looks something like this:

  class ExistingLinkCheckMiddleware (object):

     def __init __ (self):

         ... open connection to database

     def process_request (self, request, spider):

         ... before each request check in the DB
         that the page hasn't been scraped before

The conveyor looks similar:

  class MachinelearningPipeline (object):

     def __init __ (self):

         ... open connection to database

     def process_item (self, item, spider):

         ... save the item to the database

It works fine, but I cannot find a way to completely close these database connections when the spider ends, which annoys me.

Does anyone know how to do this?

+4

python web-scraping scrapy

Jamie brown May 23 '13 at 10:10

source share

1 answer

alecxe · Accepted Answer · 2013-05-23T10:30:17+0000

I think the best way to do this is to use scrapy signal spider_closed , for example:

from scrapy import signals from scrapy.xlib.pydispatch import dispatcher class ExistingLinkCheckMiddleware(object): def __init__(self): # open connection to database dispatcher.connect(self.spider_closed, signals.spider_closed) def spider_closed(self, spider, reason): # close db connection def process_request(self, request, spider): # before each request check in the DB # that the page hasn't been scraped before

Closing a database connection to a pipeline and middleware in Scrapy

More articles: