A pure JavaScript solution for Google Ajax Crawlable Spec

Question

A pure JavaScript solution for Google Ajax Crawlable Spec

I have a project that is highly dependent on JavaScript (e.g. node.js, backbone.js, etc.). I use hashbang urls like / #! / About and read google ajax crawlable spec. I did a small part of silently testing the user interface with zombies and can easily imagine how this can be done by setting a little delay and returning the static content back to the google bot. But I really do not want to implement this from scratch and hoped that there was already an existing library that fit into my stack. Know one?

EDIT: At the time of writing, I don't think it exists. However, rendering using a backbone network (or similar) on the server and client is plausible (even if not a direct answer). Therefore, I am going to mark this as an answer, although there may be better solutions in the future.

+6

node.js ajax web-crawler zombie.js

Rob Jan 19 '12 at 17:38

source share

4 answers

Just to listen, I ran into this problem (I have a very heavy ajax / js site) and I found this that might be of interest:

crawlme

I still have to try, but it looks like it will make the whole process a piece of cake if it is advertised! it is a piece of middleware / middleware that is simply pasted before any page calls and seems to take care of everyone else.

Edit:

Having tried crawlme, I had some success, but the firewall without the browser it uses (zombie.js) failed with some of my javascript content, probably because it works by emulting the DOM and therefore will not be ideal.

Sooo, instead, I took possession of a full browser without a browser, phantomjs and a set of node links for it, for example:

npm install phantomjs node-phantom

Then I created my own script, similar to crawlme, but using phantomjs instead of zombie.js. This approach seems to work just fine and render each of my ajax-based pages perfectly. The script I wrote to remove this can be found here . using it is simple:

 var googlebot = require("./path-to-file");

and then before any other calls to your application (this is using express, but should also work with the connection:

 app.use(googlebot());

the source is realistically simple, minus a few regular expressions, so you have a gander :)

Result: AJAX heavy website node.js / connect / express can be crawled by googlebot.

+10

jsdw Mar 07 '13 at 14:31

source share

The crawleable nodejs module seems to be suitable for this purpose: https://npmjs.org/package/crawlable and an example of such a SPA that can be displayed on the server side in node https://github.com/trupin/crawlable-todos

+1

Guillaume berche Nov 06 '13 at 15:24

source share

The backbone looks interesting: http://documentcloud.github.com/backbone/

http://lostechies.com/derickbailey/2011/09/26/seo-and-accessibility-with-html5-pushstate-part-1-introducing-pushstate/

0

Robert Peters Jan 31 '12 at 1:24

source share