Difference between Scraper, Caterpillar and Spider in Scrapy Context

Question

Difference between Scraper, Caterpillar and Spider in Scrapy Context

Trying to read Scrapy code. The words scaper, crawler and spider are confusing. for instance

scrapy.core.scraper scrapy.crawler scrapy.spiders

Can someone explain the meanings and differences of these terms in the context of Scrapy? Thanks in advance.

+5

web-crawler scrapy scrapy-spider

Frozen flame Dec 14 '15 at 6:20

source share

1 answer

bosnjak · Accepted Answer · 2015-12-16T15:02:11+0000

Crawler ( scrapy.crawler ) is the main entry point to the Scrapy API. It provides access to all the main components of Scrapy, and it is used to intercept extension functions in Scrapy.

Component

A scraper ( scrapy.core.scraper ) is responsible for analyzing responses and extracting information from them. It starts from Engine and is used to launch your spiders.

scrapy.spiders is a module that contains the basic Spider implementation (which you use to write your spiders) along with some common spiders available out of the box (for example, CrawlSpider for scanning rule sets, SitemapSpider for scanning based on a sitemap or XMLFeedSpider for crawling XML feeds).

Additional information is available on the official pages of the documentation:
http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=crawlspider#spiders http://doc.scrapy.org/en/latest/topics/api.html?highlight=scrapy.crawler # module-scrapy.crawler

Difference between Scraper, Caterpillar and Spider in Scrapy Context

More articles: