I have a Scrapy project that uses specialized middleware and a custom pipeline to check and store records in DB Postgres. Middleware looks something like this:
class ExistingLinkCheckMiddleware (object):
def __init __ (self):
... open connection to database
def process_request (self, request, spider):
... before each request check in the DB
that the page hasn't been scraped before The conveyor looks similar:
class MachinelearningPipeline (object):
def __init __ (self):
... open connection to database
def process_item (self, item, spider):
... save the item to the database It works fine, but I cannot find a way to completely close these database connections when the spider ends, which annoys me.
Does anyone know how to do this?
source share