Escalating aspx with Python Mechanize - preparing a Javascript form

I am trying to clear the results of a UK Food Ratings aspx agency (e.g. g http://ratings.food.gov.uk/QuickSearch.aspx?q=po30 ) using Mechanize / Python on scraperwiki ( http://scraperwiki.com / scrapers / food_standards_agency / ), but if there is a problem when trying to follow the "next" links of the page, which have the form:

<input type="submit" name="ctl00$ContentPlaceHolder1$uxResults$uxNext" value="Next >" id="ctl00_ContentPlaceHolder1_uxResults_uxNext" title="Next >" /> 

The form handler looks like this:

 <form method="post" action="QuickSearch.aspx?q=po30" onsubmit="javascript:return WebForm_OnSubmit();" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_ContentPlaceHolder1_buttonSearch')" id="aspnetForm"> <input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" /> <input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" /> <input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" /> 

Does HTTP tracing manually clicking Next Links show __EVENTTARGET to be empty? All the cheat sheets that I can find on other scrapers show __EVENTTARGET manipulation as a way to handle the following pages.

Really, I'm not sure how the page I want to clear ever loads the next page? No matter what I throw on the scraper, it always manages to load the first page of results. (Even the ability to change the number of results per page will be useful, but I don’t see how to do it!)

So, any ideas on how to clear result pages 1 + N'for N> 0?

+6
source share
2 answers

The mechanism does not handle javascript, but it is not needed for this specific case.

First we open the results page using mechanization

 url = 'http://ratings.food.gov.uk/QuickSearch.aspx?q=po30' br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] br.open(url) response = br.response().read() 

Then we select the aspnet form:

 br.select_form(nr=0) #Select the first (and only) form - it has no name so we reference by number 

The form has 5 submit buttons - we want to send the one that will lead us to the next page of results:

 response = br.submit(name='ctl00$ContentPlaceHolder1$uxResults$uxNext').read() #"Press" the next submit button 

Other submit buttons in the form:

 ctl00$uxLanguageSwitch # Switch language to Welsh ctl00$ContentPlaceHolder1$uxResults$Button1 # Search submit button ctl00$ContentPlaceHolder1$uxResults$uxFirst # First result page ctl00$ContentPlaceHolder1$uxResults$uxPrevious # Previous result page ctl00$ContentPlaceHolder1$uxResults$uxLast # Last result page 

In mechanization, we can obtain form information as follows:

 for form in br.forms(): print form 
+8
source

The mechanism does not process JavaScript.

There are many ways to handle this, including QtWebKit , python-spidermonkey , HtmlUnit (using Jython), or SeleniumRC .

Here's how to do it with SeleniumRC:

 import selenium sel=selenium.selenium("localhost",4444,"*firefox", "http://ratings.food.gov.uk") sel.start() sel.open("QuickSearch.aspx?q=po30") sel.click('ctl00$ContentPlaceHolder1$uxResults$uxNext') 

See also these related SO questions:

+2
source

Source: https://habr.com/ru/post/888952/


All Articles