Ability to change settings while working with script

I want to run scrapy from one script , and I want to get all the settings from settings.py, but I would like to be able to change some of them:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

*### so what im missing here is being able to set or override one or two of the settings###*


# 'followall' is the name of one of the spiders of the project.
process.crawl('testspider', domain='scrapinghub.com')
process.start() # the script will block here until the crawling is finished

I could not use this . I tried the following:

settings=scrapy.settings.Settings()
settings.set('RETRY_TIMES',10)

but it didn’t work.

Note. I am using the latest version of scrapy.

+4
source share
2 answers

So, to override some parameters, one way would be to override / set custom_settings, a static spider variable, in our script.

so I imported the spider class and then redefined custom_setting:

from testspiders.spiders.followall import FollowAllSpider 

FollowAllSpider.custom_settings={'RETRY_TIMES':10}

So this is the whole script:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from testspiders.spiders.followall import FollowAllSpider 

FollowAllSpider.custom_settings={'RETRY_TIMES':10}
process = CrawlerProcess(get_project_settings())


# 'followall' is the name of one of the spiders of the project.
process.crawl('testspider', domain='scrapinghub.com')
process.start() # the script will block here until the crawling is finished
+4
source

- script . , . , - .

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())
process.settings.set(
            'RETRY_TIMES', 10, priority='cmdline')

process.crawl('testspider', domain='scrapinghub.com')
process.start()
+1

Source: https://habr.com/ru/post/1611369/


All Articles