Scrapy: skip point and continue with exectuion

Question

Scrapy: skip point and continue with exectuion

I am doing an RSS spider. I want to continue execution the spider ignores the current node if in the current item ... So far I have this:

if info.startswith('Foo'): item['foo'] = info.split(':')[1] else: return None

(information is a string that has been cleared from xpath to ...)

But I get this exception:

  exceptions.TypeError: You cannot return an "NoneType" object from a

spider

So how can I ignore this node and continue execution?

+4

python web-crawler scrapy

anders Feb 18 '11 at 10:23

source share

2 answers

There is an undocumented method that I found out when I had to skip an element during parsing, but not outside the callback function.

Just raise StopIteration anywhere during parsing.

 class MySpider(Spider): def parse(self, response): value1 = parse_something1() value2 = parse_something1() yield Item(value1, value2) def parse_something1(self): try: return get_some_value() except Exception: self.skip_item() def parse_something2(self): if something_wrong: self.skip_item() def skip_item(self): raise StopIteration

+1

Nour chawich Jun 11 '17 at 22:20

source share

seriyPS · Accepted Answer · 2011-02-18T13:32:37+0000

 parse(response): #make some manipulations if info.startswith('Foo'): item['foo'] = info.split(':')[1] return [item] else: return []

But it’s better not to use return, use yield or do nothing

 parse(response): #make some manipulations if info.startswith('Foo'): item['foo'] = info.split(':')[1] yield item else: return

Scrapy: skip point and continue with exectuion

More articles: