Question with Scrapy Newbie - cannot work with training file

Question

Question with Scrapy Newbie - cannot work with training file

I'm a complete newbie to Python and Scrapy, so I started by trying to replicate the tutorial. I am trying to clean the www.dmoz.org website according to the manual.

I will compile dmoz_spider.py as below

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from dmoz.items import DmozItem

class DmozSpider(BaseSpider):
   name = "dmoz.org"
   allowed_domains = ["dmoz.org"]
   start_urls = [
       "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
       "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
   ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//ul/li')
       items = []
       for site in sites:
           item = DmozItem()
           item['title'] = site.select('a/text()').extract()
           item['link'] = site.select('a/@href').extract()
           item['desc'] = site.select('text()').extract()
           items.append(item)
       return items

and what I have to get through the site is something else.
any idea what i mess up?

+3

python scrapy

racket99 Dec 16 '10 at 23:47

source share

3 answers

Dang · Answer 1 · 2012-08-30T02:11:38+0000

I had this problem. Make sure you make the next change, as the tutorial says.

Open items.py and see if you change the class

class TutorialItem(Item):
    title=Field()
    link=Field()
    desc=Field()

at

class DmozItem(Item):
    title=Field()
    link=Field()
    desc=Field()

Ptival · Answer 2 · 2010-12-24T04:59:10+0000

, , . , , ? ( , ...)

saurshaz · Answer 3 · 2013-09-03T07:08:57+0000

You need to go to the directory containing the settings.py file and run

scrapy crawl dmoz from there.

Clear your project structure from https://github.com/scrapy/dirbot for clarity

Question with Scrapy Newbie - cannot work with training file

More articles: