Overview:
All screen squeaks first require a manual view of the page from which you want to extract resources. When you work with AJAX, you usually just need to parse a little more than just HTML.
When working with AJAX, this simply means that the required value is not in the original HTML document that you requested, but javascript will be displayed that asks you for additional information.
That way, you can just simply parse javascript and see what kind of request javascript is doing, and just call that URL instead.
Example:
Take this as an example, suppose the page you want to clear has the following script:
<script type="text/javascript"> function ajaxFunction() { var xmlHttp; try { // Firefox, Opera 8.0+, Safari xmlHttp=new XMLHttpRequest(); } catch (e) { // Internet Explorer try { xmlHttp=new ActiveXObject("Msxml2.XMLHTTP"); } catch (e) { try { xmlHttp=new ActiveXObject("Microsoft.XMLHTTP"); } catch (e) { alert("Your browser does not support AJAX!"); return false; } } } xmlHttp.onreadystatechange=function() { if(xmlHttp.readyState==4) { document.myForm.time.value=xmlHttp.responseText; } } xmlHttp.open("GET","time.asp",true); xmlHttp.send(null); } </script>
Then all you have to do is instead send an HTTP request to time.asp on the same server. Example from w3schools .
Advanced Scraper with C ++:
For complex use, and if you use C ++, you can also use the javascript firefox SpiderMonkey mechanism to execute javascript on the page.
Advanced Scraper with Java:
For complex use, and if you use Java, you might also consider using the javascript firefox engine for Java Rhino
Advanced Scraper with .NET:
For complex use, and if you use .Net, you can also consider using the Microsoft.vsa assembly. It has recently been replaced by ICodeCompiler / CodeDOM.
Brian R. Bondy Nov 04 '08 at 2:24 2008-11-04 02:24
source share