Scrapy: next button uses javascript

Question

Scrapy: next button uses javascript

I am trying to clear this site http://saintbarnabas.hodesiq.com/joblist.asp?user_id= and I want to get all the RNs in it ... I can clear the data but cannot go to the next page because of its javascript. I tried reading other questions, but I do not understand. This is my code.

class MySpider(CrawlSpider): name = "commu" allowed_domains = ["saintbarnabas.hodesiq.com"] start_urls = ["http://saintbarnabas.hodesiq.com/joblist.asp?user_id=", ] rules = (Rule (SgmlLinkExtractor(allow=('\d+'),restrict_xpaths=('*')) , callback="parse_items", follow= True), )

the next button shows how

 <a href="Javascript: Move('next')">Next</a>

This pagination is killing me ...

+4

python web-scraping scrapy selenium-webdriver

chano 15 sept. '13 at 9:27

source share

1 answer

Rollingo · Accepted Answer · 2013-09-15T17:04:40+0000

In short, you need to find out what Move('next') does and plays this in your code.

A quick look at the sites shows that the function code is as follows:

 function Move(strIndicator) { document.frm.move_indicator.value = strIndicator; document.frm.submit(); }

And document.frm is a form called "frm":

 <form name="frm" action="joblist.asp" method="post">

So basically you need to build a request to execute POST for this form with a value of move_indicator as 'next' . This is easy to do using the FormRequest class ( see documents ), for example:

 return FormRequest.from_response(response, formname="frm", formdata={'move_indicator': 'next'})

This method works in most cases. The hard part is figuring out what javascript code does, sometimes it can be confusing and do things too complicated to avoid scratches.

Scrapy: next button uses javascript

More articles: