Spider spider doesn't work

Question

Spider spider doesn't work

Since nothing works, I started a new project with

python scrapy-ctl.py startproject Nu

I definitely followed the tutorial and created folders, and a new spider

 from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector from scrapy.item import Item from Nu.items import NuItem from urls import u class NuSpider(CrawlSpider): domain_name = "wcase" start_urls = ['http://www.whitecase.com/aabbas/'] names = hxs.select('//td[@class="altRow"][1]/a/@href').re('/.a\w+') u = names.pop() rules = (Rule(SgmlLinkExtractor(allow=(u, )), callback='parse_item'),) def parse(self, response): self.log('Hi, this is an item page! %s' % response.url) hxs = HtmlXPathSelector(response) item = Item() item['school'] = hxs.select('//td[@class="mainColumnTDa"]').re('(?<=(JD,\s))(.*?)(\d+)') return item SPIDER = NuSpider()

and when i started

 C:\Python26\Scripts\Nu>python scrapy-ctl.py crawl wcase

I get

 [Nu] ERROR: Could not find spider for domain: wcase

Other spiders are at least recognized by Scrapy, this is not. What am I doing wrong?

Thanks for your help!

-1

python scrapy

Zeynel Nov 27 '09 at 5:51

source share

5 answers

Gnu engineer · Answer 1 · 2010-12-27T20:19:01+0000

Please also check the treatment version. In the latest version, instead of the name "domain_name", the name "name" is used to uniquely identify the spider.

Tim McNamara · Answer 2 · 2010-07-18T19:17:06+0000

These two lines look as if they are causing problems:

 u = names.pop() rules = (Rule(SgmlLinkExtractor(allow=(u, )), callback='parse_item'),)

Each time the script is run, only one rule will be executed. Consider creating a rule for each URL.
You did not create a parse_item , which means the rule does nothing. The only callback you defined is parse , which changes the default behavior of the spider.

In addition, here are some things to watch out for.

CrawlSpider does not like to overload its default parse method. Find parse_start_url in the documentation or docs. You will see that this is the preferred way to override the default parse method for your source URLs.
NuSpider.hxs is called before it is determined.

user137673 · Answer 3 · 2009-11-27T14:30:49+0000

Have you included the spider in the SPIDER_MODULES list in your scrapy_settings.py?

It is not written anywhere in the textbook that you need, but you need.

Rollingo · Answer 4 · 2010-05-22T03:03:19+0000

I believe there are syntax errors. name = hxs... will not work because you will not get it before the hxs object.

Try running python yourproject/spiders/domain.py to get syntax errors.

Markos Fragkakis · Answer 5 · 2011-10-12T18:21:58+0000

You override the parse method instead of implementing the new parse_item method.

Spider spider doesn't work

More articles: