Beautifulsoup and mechanize to get the result of an ajax call

Question

Beautifulsoup and mechanize to get the result of an ajax call

Hi, I'm creating a scraper using python 2.5 and beautifulsoup but in case of a problem ... the part of the webpage is generated after the user clicks on some button, start the ajax request by calling a specific javacsript function using the correct parameters

Is there a way to simulate user interaction and get this result? I come across a mechanization module, but it seems to me that this is mainly used for working with forms ...

I would appreciate any links or some code examples thanks

+4

python ajax beautifulsoup mechanize scraper

nabizan Apr 9 '10 at 19:01

source share

2 answers

No, you cannot do it easily. AFAIK your options are easiest:

Read the AJAX javascript code yourself, as a human programmer, understand it, and then write python code to simulate AJAX calls manually. You can also use some capture software to capture requests / responses made in real time and try to play them using code;
Use selenium or some other browser automation tool to get the page in a real web browser;
Use some javascript runner for python like spidermonkey or pyv8 to run javascript code and bind it to your copy of HTML dom;

+3

nosklo Apr 9 '10 at 19:19

source share

nabizan · Accepted Answer · 2010-04-10T12:59:27+0000

ok, so I realized that it was pretty simple after I realized that I can use a combination of urllib, ulrlib2 and beautifulsoup

import urllib, urllib2 from BeautifulSoup import BeautifulSoup as bs_parse data = urllib.urlencode(values) req = urllib2.Request(url, data) res = urllib2.urlopen(req) page = bs_parse(res.read())

Beautifulsoup and mechanize to get the result of an ajax call

More articles: