A pure JavaScript solution for Google Ajax Crawlable Spec

I have a project that is highly dependent on JavaScript (e.g. node.js, backbone.js, etc.). I use hashbang urls like / #! / About and read google ajax crawlable spec. I did a small part of silently testing the user interface with zombies and can easily imagine how this can be done by setting a little delay and returning the static content back to the google bot. But I really do not want to implement this from scratch and hoped that there was already an existing library that fit into my stack. Know one?

EDIT: At the time of writing, I don't think it exists. However, rendering using a backbone network (or similar) on the server and client is plausible (even if not a direct answer). Therefore, I am going to mark this as an answer, although there may be better solutions in the future.

+6
source share
4 answers

There is one implementation using node.js and Backbone.js on the server and in the browser https://github.com/Morriz/backbone-everywhere

+2
source

Just to listen, I ran into this problem (I have a very heavy ajax / js site) and I found this that might be of interest:

crawlme

I still have to try, but it looks like it will make the whole process a piece of cake if it is advertised! it is a piece of middleware / middleware that is simply pasted before any page calls and seems to take care of everyone else.

Edit:

Having tried crawlme, I had some success, but the firewall without the browser it uses (zombie.js) failed with some of my javascript content, probably because it works by emulting the DOM and therefore will not be ideal.

Sooo, instead, I took possession of a full browser without a browser, phantomjs and a set of node links for it, for example:

npm install phantomjs node-phantom 

Then I created my own script, similar to crawlme, but using phantomjs instead of zombie.js. This approach seems to work just fine and render each of my ajax-based pages perfectly. The script I wrote to remove this can be found here . using it is simple:

 var googlebot = require("./path-to-file"); 

and then before any other calls to your application (this is using express, but should also work with the connection:

 app.use(googlebot()); 

the source is realistically simple, minus a few regular expressions, so you have a gander :)

Result: AJAX heavy website node.js / connect / express can be crawled by googlebot.

+10
source

The crawleable nodejs module seems to be suitable for this purpose: https://npmjs.org/package/crawlable and an example of such a SPA that can be displayed on the server side in node https://github.com/trupin/crawlable-todos

+1
source

Source: https://habr.com/ru/post/906419/


All Articles