I am trying to clear data from a page that has a lot of AJAX calls and javascript execution to display a webpage. Therefore, I am trying to use scrapy with selenium for this. Modus operandi is as follows:
The code that I still have is as follows:
from scrapy.spider import BaseSpider from scrapy.http import FormRequest, Request from selenium import webdriver import time class LoginSpider(BaseSpider): name = "sel_spid" start_urls = ["http://www.example.com/login.aspx"] def __init__(self): self.driver = webdriver.Firefox() def parse(self, response): return FormRequest.from_response(response, formdata={'User': 'username', 'Pass': 'password'}, callback=self.check_login_response) def check_login_response(self, response): if "Log Out" in response.body: self.log("Successfully logged in") scrape_url = "http://www.example.com/authen_handler.aspx?SearchString=DWT+%3E%3d+500" yield Request(url=scrape_url, callback=self.parse_page) else: self.log("Bad credentials") def parse_page(self, response): self.driver.get(response.url) next = self.driver.find_element_by_class_name('dxWeb_pNext') next.click() time.sleep(2)
The two control blocks that I still got to are the following:
Step 4 does not work. When selenium opens a firefox window, it is always on the login screen and does not know how to get past it.
I do not know how to reach step 5
Any help would be greatly appreciated.
python selenium scrapy
Amistad Feb 09 '15 at 21:50 2015-02-09 21:50
source share