Scrapy: skip point and continue with exectuion

I am doing an RSS spider. I want to continue execution the spider ignores the current node if in the current item ... So far I have this:

if info.startswith('Foo'): item['foo'] = info.split(':')[1] else: return None 

(information is a string that has been cleared from xpath to ...)

But I get this exception:

  exceptions.TypeError: You cannot return an "NoneType" object from a 

spider

So how can I ignore this node and continue execution?

+4
source share
2 answers
 parse(response): #make some manipulations if info.startswith('Foo'): item['foo'] = info.split(':')[1] return [item] else: return [] 

But it’s better not to use return, use yield or do nothing

 parse(response): #make some manipulations if info.startswith('Foo'): item['foo'] = info.split(':')[1] yield item else: return 
+10
source

There is an undocumented method that I found out when I had to skip an element during parsing, but not outside the callback function.

Just raise StopIteration anywhere during parsing.

 class MySpider(Spider): def parse(self, response): value1 = parse_something1() value2 = parse_something1() yield Item(value1, value2) def parse_something1(self): try: return get_some_value() except Exception: self.skip_item() def parse_something2(self): if something_wrong: self.skip_item() def skip_item(self): raise StopIteration 
+1
source

Source: https://habr.com/ru/post/1340298/


All Articles