Scrapy: next button uses javascript

I am trying to clear this site http://saintbarnabas.hodesiq.com/joblist.asp?user_id= and I want to get all the RNs in it ... I can clear the data but cannot go to the next page because of its javascript. I tried reading other questions, but I do not understand. This is my code.

class MySpider(CrawlSpider): name = "commu" allowed_domains = ["saintbarnabas.hodesiq.com"] start_urls = ["http://saintbarnabas.hodesiq.com/joblist.asp?user_id=", ] rules = (Rule (SgmlLinkExtractor(allow=('\d+'),restrict_xpaths=('*')) , callback="parse_items", follow= True), ) 

the next button shows how

 <a href="Javascript: Move('next')">Next</a> 

This pagination is killing me ...

+4
source share
1 answer

In short, you need to find out what Move('next') does and plays this in your code.

A quick look at the sites shows that the function code is as follows:

 function Move(strIndicator) { document.frm.move_indicator.value = strIndicator; document.frm.submit(); } 

And document.frm is a form called "frm":

 <form name="frm" action="joblist.asp" method="post"> 

So basically you need to build a request to execute POST for this form with a value of move_indicator as 'next' . This is easy to do using the FormRequest class ( see documents ), for example:

 return FormRequest.from_response(response, formname="frm", formdata={'move_indicator': 'next'}) 

This method works in most cases. The hard part is figuring out what javascript code does, sometimes it can be confusing and do things too complicated to avoid scratches.

+4
source

Source: https://habr.com/ru/post/1502245/


All Articles