Using arguments in the pipeline for samples on init

Question

Using arguments in the pipeline for samples on init

I have scrapy pipelines.py and I want to get these arguments. In my spider.py it works fine:

class MySpider( CrawlSpider ): def __init__(self, host='', domain_id='', *args, **kwargs): super(MySpider, self).__init__(*args, **kwargs) print user_id ...

Now I need "user_id" in my pipelines.py to create a sqlite database such as "domain-123.db". I am browsing the entire network about my problem, but I can not find any solution.

Can someone help me?

PS: Yes, I tried the super () function inside my pipelines. A class like spyer.py, it does not work.

+7

python arguments web-scraping scrapy scrapy-spider

user3507915 Dec 16 '14 at 20:50

source share

2 answers

I may be late to give a useful op answer, but for anyone who comes to this question in the future (like me) , you should check out the from_crawler and / or from_settings .

That way you can pass your arguments the way you want.

Check out: https://doc.scrapy.org/en/latest/topics/item-pipeline.html#from_crawler

from_crawler(cls, crawler)
If present, this cool method is called to instantiate the pipeline from the scanner. It should return a new instance of the pipeline. The Crawler object provides access to all Scrapy components, such as settings and signals; this is the way to access them and connect its functions to Scrapy.
Parameters: crawler (Crawler object) - crawler that uses this pipeline

0

Vls Apr 13 '18 at 11:21

source share

alecxe · Accepted Answer · 2014-12-16T21:04:54+0000

Set the arguments inside the spider constructor:

 class MySpider(CrawlSpider): def __init__(self, user_id='', *args, **kwargs): self.user_id = user_id super(MySpider, self).__init__(*args, **kwargs)

And read them in the open_spider() method of your pipeline:

 def open_spider(self, spider): print spider.user_id

Using arguments in the pipeline for samples on __init__

More articles:

Using arguments in the pipeline for samples on init