I am not very familiar with the internal workings of Node.js, but as far as I know, you get "Maximum Call Stack Size" errors when you make too many function calls.
I make a spider that will follow the links, and I started getting these erros after a random number of workarounds. Node doesn't give you a stack trace when this happens, but I'm sure I don't have recursion errors.
I use request to retrieve URLs, and I used cheerio to parse selected HTML and discover new links. Stack overflows always occurred inside cheerio. When I changed cheerio to htmlparser2 , the errors disappeared. Htmlparser2 is much easier because it simply generates events on every open tag instead of parsing entire documents and building a tree.
My theory is that cheerio swallowed all the memory on the stack, but I'm not sure if this is even possible?
Here is a simplified version of my code (it is read-only, it will not work):
var _ = require('underscore'); var fs = require('fs'); var urllib = require('url'); var request = require('request'); var cheerio = require('cheerio'); var mongo = "This is a global connection to mongodb."; var maxConc = 7; var crawler = { concurrent: 0, queue: [], fetched: {}, fetch: function(url) { var self = this; self.concurrent += 1; self.fetched[url] = 0; request.get(url, { timeout: 10000, pool: { maxSockets: maxConc } }, function(err, response, body){ self.concurrent -= 1; self.fetched[url] = 1; self.extract(url, body); }); }, extract: function(referrer, data) { var self = this; var urls = []; mongo.pages.insert({ _id: referrer, html: data, time: +(new Date) }); cheerio.load(data)('a').each(function(){ var href = resolve(this.attribs.href, referer);
source share