In a Scrapy box, how can I call one function from another?

Question

In a Scrapy box, how can I call one function from another?

I know this is a newbie question, and this is the main Python question, but it is in the context of Scrapy and I cannot find the answer anywhere.

When I run this bot code:

import scrapy from tutorial.items import DmozItem class DmozSpider(scrapy.Spider): name = "dmoz" allowed_domains = ["lib-web.org"] start_urls = [ "http://www.lib-web.org/united-states/public-libraries/michigan/" ] count = 0 def increment(self): global count count += 1 def getCount(self): global count return count def parse(self, response): increment() for sel in response.xpath('//div/div/div/ul/li'): item = DmozItem() item['title'] = sel.xpath('a/text()').extract() item['link'] = sel.xpath('a/@href').extract() item['desc'] = sel.xpath('p/text()').extract() x = getCount() print x yield item

DmozItem:

 import scrapy class DmozItem(scrapy.Item): title = scrapy.Field() link = scrapy.Field() desc = scrapy.Field()

I get this error:

 File "/Users/Admin/scpy_projs/tutorial/tutorial/spiders/dmoz_spider.py", line 23, in parse increment() NameError: global name 'increment' is not defined

Why can't I call increment() from parse(self, response) ? How can I do this job?

Thanks for any help.

+5

python python-2.7 web-scraping scrapy

ryan71 Nov 13 '15 at 18:35

source share

1 answer

alecxe · Accepted Answer · 2015-11-13T18:43:49+0000

increment() is an instance method of your spider - use self.increment() to call it.

In addition, there is no need to use global variables - define count() as an instance variable.

Fixed Version:

 import scrapy from tutorial.items import DmozItem class DmozSpider(scrapy.Spider): name = "dmoz" allowed_domains = ["lib-web.org"] start_urls = [ "http://www.lib-web.org/united-states/public-libraries/michigan/" ] def __init__(self, *args, **kwargs): super(DmozSpider, self).__init__(*args, **kwargs) self.count = 0 def increment(self): self.count += 1 def getCount(self): return self.count def parse(self, response): self.increment() for sel in response.xpath('//div/div/div/ul/li'): item = DmozItem() item['title'] = sel.xpath('a/text()').extract() item['link'] = sel.xpath('a/@href').extract() item['desc'] = sel.xpath('p/text()').extract() x = self.getCount() print x yield item

You can also define count as a property.

In a Scrapy box, how can I call one function from another?

More articles: