I am trying to use Scrapy to get some data from Google Analytics, and although I am a complete Python newbie, I have made some progress. Now I can log into Google Analytics using Scrapy, but I need to make an AJAX request to get the data that I want. I tried to replicate the HTTP header of my browser with the code below, but it does not work, my error log says
too many values ββto unpack
Did anyone help? I worked on it for two days, I have a feeling that I am very close, but I am also very confused.
Here is the code:
from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from scrapy.http import FormRequest, Request from scrapy.selector import Selector import logging from super.items import SuperItem from scrapy.shell import inspect_response import json class LoginSpider(BaseSpider): name = 'super' start_urls = ['https://accounts.google.com/ServiceLogin?service=analytics&passive=true&nui=1&hl=fr&continue=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr&followup=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr#identifier'] def parse(self, response): return [FormRequest.from_response(response, formdata={'Email': 'Email'}, callback=self.log_password)] def log_password(self, response): return [FormRequest.from_response(response, formdata={'Passwd': 'Password'}, callback=self.after_login)] def after_login(self, response): if "authentication failed" in response.body: self.log("Login failed", level=logging.ERROR) return
And here is the part of the magazine:
2016-03-28 19:11:39 [scrapy] INFO: Enabled item pipelines: [] 2016-03-28 19:11:39 [scrapy] INFO: Spider opened 2016-03-28 19:11:39 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2016-03-28 19:11:39 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2016-03-28 19:11:40 [scrapy] DEBUG: Crawled (200) <GET https://accounts.google.com/ServiceLogin?service=analytics&passive=true&nui=1&hl=fr&continue=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr&followup=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr#identifier> (referer: None) 2016-03-28 19:11:46 [scrapy] DEBUG: Crawled (200) <POST https://accounts.google.com/AccountLoginInfo> (referer: https://accounts.google.com/ServiceLogin?service=analytics&passive=true&nui=1&hl=fr&continue=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr&followup=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr) 2016-03-28 19:11:50 [scrapy] DEBUG: Redirecting (302) to <GET https://accounts.google.com/CheckCookie?hl=fr&checkedDomains=youtube&pstMsg=0&chtml=LoginDoneHtml&service=analytics&continue=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr&gidl=CAA> from <POST https://accounts.google.com/ServiceLoginAuth> 2016-03-28 19:11:57 [scrapy] DEBUG: Redirecting (302) to <GET https://www.google.com/analytics/web/?hl=fr> from <GET https://accounts.google.com/CheckCookie?hl=fr&checkedDomains=youtube&pstMsg=0&chtml=LoginDoneHtml&service=analytics&continue=https%3A%2F%2Fwww.google.com%2Fanalytics%2Fweb%2F%3Fhl%3Dfr&gidl=CAA> 2016-03-28 19:12:01 [scrapy] DEBUG: Crawled (200) <GET https://www.google.com/analytics/web/?hl=fr> (referer: https://accounts.google.com/AccountLoginInfo) Login Successful!! 2016-03-28 19:12:01 [scrapy] ERROR: Spider error processing <GET https://www.google.com/analytics/web/?hl=fr> (referer: https://accounts.google.com/AccountLoginInfo) Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/twisted/internet/defer.py", line 577, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/Users/aminbouraiss/super/super/spiders/mySuper.py", line 42, in after_login callback=self.parse_tastypage, dont_filter=True) File "/Library/Python/2.7/site-packages/Scrapy-1.1.0rc3-py2.7.egg/scrapy/http/request/__init__.py", line 35, in __init__ self.headers = Headers(headers or {}, encoding=encoding) File "/Library/Python/2.7/site-packages/Scrapy-1.1.0rc3-py2.7.egg/scrapy/http/headers.py", line 12, in __init__ super(Headers, self).__init__(seq) File "/Library/Python/2.7/site-packages/Scrapy-1.1.0rc3-py2.7.egg/scrapy/utils/datatypes.py", line 193, in __init__ self.update(seq) File "/Library/Python/2.7/site-packages/Scrapy-1.1.0rc3-py2.7.egg/scrapy/utils/datatypes.py", line 229, in update super(CaselessDict, self).update(iseq) File "/Library/Python/2.7/site-packages/Scrapy-1.1.0rc3-py2.7.egg/scrapy/utils/datatypes.py", line 228, in <genexpr> iseq = ((self.normkey(k), self.normvalue(v)) for k, v in seq) ValueError: too many values to unpack 2016-03-28 19:12:01 [scrapy] INFO: Closing spider (finished) 2016-03-28 19:12:01 [scrapy] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 6419, 'downloader/request_count': 5, 'downloader/request_method_count/GET': 3, 'downloader/request_method_count/POST': 2, 'downloader/response_bytes': 75986, 'downloader/response_count': 5, 'downloader/response_status_count/200': 3, 'downloader/response_status_count/302': 2, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 3, 28, 23, 12, 1, 824033), 'log_count/DEBUG': 6,