How can I clear a webpage using jQuery and XPath?

Question

How can I clear a webpage using jQuery and XPath?

I can insert javascript jQuery link into web page header via Firebug. Then I can run the script to clear it and the pages it links to.

How to start writing this script in jQuery or javascript in general? Is there an interface in jQuery / Javascript with which I can use XPath to access the elements on the page (and on the pages that it refers to)?

+4

javascript jquery xpath web-scraping

dangerChihuahua007 Mar 08 '12 at 15:32

source share

3 answers

JP Richardson · Answer 1 · 2012-03-08T16:28:06+0000

First you need a JavaScript runtime environment outside the browser. The most common is Node.js. Then you need a way to create the client part of the DOM. This is usually done using jsdom .

So your script should:

load the html page ( jsdom does this for you, but you can use request )
create a client-side DOM
jQuery analysis

Here is an example Node.js script:

 var jsdom = require("jsdom"); jsdom.env("http://nodejs.org/dist/", [ 'http://code.jquery.com/jquery-1.5.min.js' ], function(errors, window) { console.log("there have been", window.$("a").length, "nodejs releases!"); });

You run it like this:

 $ node scrape.js

Remember to install jsdom first:

 $ npm install --production jsdom

austincheney · Answer 2 · 2012-03-08T16:09:22+0000

You can quickly get the HTML page:

 var html = document.documentElement.innerHTML;

This will only return a string literal and will not capture the root element.

nrabinowitz · Answer 3 · 2012-03-17T20:08:41+0000

You may be interested in pjscrape , a web clip library built just for this purpose (disclaimer: this is my project). It is based on PhantomJS , a mute Webkit implementation that you can run from the command line, and has a very simple syntax for clearing data from multiple pages and finding additional URLs for spiders and scratches.

How can I clear a webpage using jQuery and XPath?

More articles: