Hi, how can I make my crawlspider work, I can log in, but nothing happens, I really do not get scratches. Also, I read the therapy documentation and I really don't understand the rules used for cleaning. Why nothing happens after "Successfully logged in. Let the scanning begin!"
I also had this rule at the end of my else statement, but delete it because it was not even called because it was inside my else block. so I moved it to the beginning of the start_request () method, but got errors, so I deleted my rules.
rules = ( Rule(extractor,callback='parse_item',follow=True), )
my code is:
from scrapy.contrib.spiders.init import InitSpider from scrapy.http import Request, FormRequest from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.contrib.spiders import Rule from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from linkedconv.items import LinkedconvItem class LinkedPySpider(CrawlSpider): name = 'LinkedPy' allowed_domains = ['linkedin.com'] login_page = 'https://www.linkedin.com/uas/login'
myoutput
C:\Users\ye831c\Documents\Big Data\Scrapy\linkedconv>scrapy crawl LinkedPy 2013-07-12 13:39:40-0500 [scrapy] INFO: Scrapy 0.16.5 started (bot: linkedconv) 2013-07-12 13:39:40-0500 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetCon sole, CloseSpider, WebService, CoreStats, SpiderState 2013-07-12 13:39:41-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAut hMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, De faultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMi ddleware, ChunkedTransferMiddleware, DownloaderStats 2013-07-12 13:39:41-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMi ddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddle ware 2013-07-12 13:39:41-0500 [scrapy] DEBUG: Enabled item pipelines: 2013-07-12 13:39:41-0500 [LinkedPy] INFO: Spider opened 2013-07-12 13:39:41-0500 [LinkedPy] INFO: Crawled 0 pages (at 0 pages/min), scra ped 0 items (at 0 items/min) 2013-07-12 13:39:41-0500 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:602 3 2013-07-12 13:39:41-0500 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 2013-07-12 13:39:41-0500 [LinkedPy] DEBUG: Crawled (200) <GET https://www.linked in.com/uas/login> (referer: None) 2013-07-12 13:39:42-0500 [LinkedPy] DEBUG: Redirecting (302) to <GET http://www. linkedin.com/nhome/> from <POST https://www.linkedin.com/uas/login-submit> 2013-07-12 13:39:45-0500 [LinkedPy] DEBUG: Crawled (200) <GET http://www.linkedi n.com/nhome/> (referer: https://www.linkedin.com/uas/login) 2013-07-12 13:39:45-0500 [LinkedPy] DEBUG: Successfully logged in. Let start crawling! 2013-07-12 13:39:45-0500 [LinkedPy] DEBUG: Hi, this is an item page! http://www. linkedin.com/nhome/ 2013-07-12 13:39:45-0500 [LinkedPy] INFO: Closing spider (finished) 2013-07-12 13:39:45-0500 [LinkedPy] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1670, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 2, 'downloader/request_method_count/POST': 1, 'downloader/response_bytes': 65218, 'downloader/response_count': 3, 'downloader/response_status_count/200': 2, 'downloader/response_status_count/302': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2013, 7, 12, 18, 39, 45, 136000), 'log_count/DEBUG': 11, 'log_count/INFO': 4, 'request_depth_max': 1, 'response_received_count': 2, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'start_time': datetime.datetime(2013, 7, 12, 18, 39, 41, 50000)} 2013-07-12 13:39:45-0500 [LinkedPy] INFO: Spider closed (finished)