Using arguments in the pipeline for samples on __init__

I have scrapy pipelines.py and I want to get these arguments. In my spider.py it works fine:

class MySpider( CrawlSpider ): def __init__(self, host='', domain_id='', *args, **kwargs): super(MySpider, self).__init__(*args, **kwargs) print user_id ... 

Now I need "user_id" in my pipelines.py to create a sqlite database such as "domain-123.db". I am browsing the entire network about my problem, but I can not find any solution.

Can someone help me?

PS: Yes, I tried the super () function inside my pipelines. A class like spyer.py, it does not work.

+7
source share
2 answers

Set the arguments inside the spider constructor:

 class MySpider(CrawlSpider): def __init__(self, user_id='', *args, **kwargs): self.user_id = user_id super(MySpider, self).__init__(*args, **kwargs) 

And read them in the open_spider() method of your pipeline:

 def open_spider(self, spider): print spider.user_id 
+12
source

I may be late to give a useful op answer, but for anyone who comes to this question in the future (like me) , you should check out the from_crawler and / or from_settings .

That way you can pass your arguments the way you want.

Check out: https://doc.scrapy.org/en/latest/topics/item-pipeline.html#from_crawler

from_crawler(cls, crawler)

If present, this cool method is called to instantiate the pipeline from the scanner. It should return a new instance of the pipeline. The Crawler object provides access to all Scrapy components, such as settings and signals; this is the way to access them and connect its functions to Scrapy.

Parameters: crawler (Crawler object) - crawler that uses this pipeline

0
source

Source: https://habr.com/ru/post/979737/


All Articles