Node.js request a web page with asynchronous scripts

I am loading a webpage using the request module , which is very simple.

My problem is that the page I'm trying to load has some async scripts (there are asynchronous attributes) and they don't load with the html document coming back from the http request.

My question is how can I make an http request with the request module / no output (preferably with) and load the WHOLE download without exception, as described above, due to some cases with edges.

+4
source share
2 answers

It looks like you are trying to do webscraping using Javascript.

request - , . , , cheerio, .

x-ray x-ray jQuery, .

nightmare , . ajax, .

HTH !

+2

, , async.

. , , , . :

, html , script , : <script src="abc.js" async></script>

(httpster)

"use strict";

const request = require('request');

const options1 = { url: 'http://localhost:3333/' }

// hard coded script name for test purposes
const options2 = { url: 'http://localhost:3333/abc.js' }

let htmlData  // store html page here

request.get(options1)
    .on('response', resp => resp.on('data', d => htmlData += d))
    .on('end', () => {
        let scripts; // store scripts here

        // htmlData contains webpage
        // Use xml parser to find all script tags with async tags
        // and their base urls
        // NOT DONE FOR THIS EXAMPLE

        request.get(options2)
            .on('response', resp => resp.on('data', d => scripts += d))
            .on('end', () => {
                let allData = htmlData.toString() + scripts.toString();
                console.log(allData);
            })
           .on('error', err => console.log(err))
    })
    .on('error', err => console.log(err))

. js- url, .

0

Source: https://habr.com/ru/post/1625565/


All Articles