Closing a database connection to a pipeline and middleware in Scrapy

I have a Scrapy project that uses specialized middleware and a custom pipeline to check and store records in DB Postgres. Middleware looks something like this:

  class ExistingLinkCheckMiddleware (object):

     def __init __ (self):

         ... open connection to database

     def process_request (self, request, spider):

         ... before each request check in the DB
         that the page hasn't been scraped before 

The conveyor looks similar:

  class MachinelearningPipeline (object):

     def __init __ (self):

         ... open connection to database

     def process_item (self, item, spider):

         ... save the item to the database 

It works fine, but I cannot find a way to completely close these database connections when the spider ends, which annoys me.

Does anyone know how to do this?

+4
source share
1 answer

I think the best way to do this is to use scrapy signal spider_closed , for example:

from scrapy import signals from scrapy.xlib.pydispatch import dispatcher class ExistingLinkCheckMiddleware(object): def __init__(self): # open connection to database dispatcher.connect(self.spider_closed, signals.spider_closed) def spider_closed(self, spider, reason): # close db connection def process_request(self, request, spider): # before each request check in the DB # that the page hasn't been scraped before 

See also:

Hope this helps.

+4
source

Source: https://habr.com/ru/post/1482354/


All Articles