Best way to retrieve a url from a webpage loaded via XMLHTTPRequest?

Problem Overview

  • I have a dynamically created webpage, X , which consists of search results related to webpages, Y1 , Y2 , Y3 , etc.
  • Y1 contains the URL of resource R1 , Y2 contains the URL of resource R2 , etc.
  • I would like to dynamically improve page X with links to resources R1 , R2 , etc.

Possible Solution

I am currently thinking of using JavaScript and XMLHTTPRequest to extract HTML from web pages Y1 , Y2 , etc., and then use a regular expression to extract the URL .

Pages Y1 , Y2 , etc. are in the region of 30-100 KB of HTML each.

Does that sound like a good plan? Or am I better off getting every web page in JSON format and retrieving the resource URL from there? If HTML is the way to go, do you have any suggested optimizations / short cuts for finding 30-100 KB of text?

0
source share
1 answer

You do not want to use regex to extract the url. I suggest using jQuery to execute an AJAX request, and then use jQuery to parse and filter URLs from HTML returned from the server.

 jQuery.ajax({ url: "http://my.url.here", dataType: "html"; ... success: function(data) { jQuery("a", data).each(function() { var $link = jQuery(this); ... ... }); } ... }); 

If jQuery is not an option, you can do something like this when you get the answer:

 var html = XHR.responseText; var div = document.createElement("div"); div.innerHTML = html; //you can now search for nodes inside your div. //The following gives you all the anchor tags div.getElementsByTagName('a'); ... 
+1
source

Source: https://habr.com/ru/post/1383259/


All Articles