How to defer a request using the Html Agility Pack

I am making a request to a remote web server that is currently disabled (on purpose).

I would like to figure out how best to defer a request. Basically, if the request takes longer than "X" milliseconds, then exit the request and return a null response.

Currently, the web request is just sitting there, waiting for a response .....

What is the best way to approach this problem?

Here is the current code snippet

  public JsonpResult About(string HomePageUrl) { Models.Pocos.About about = null; if (HomePageUrl.RemoteFileExists()) { // Using the Html Agility Pack, we want to extract only the // appropriate data from the remote page. HtmlWeb hw = new HtmlWeb(); HtmlDocument doc = hw.Load(HomePageUrl); HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']"); if (node != null) { about = new Models.Pocos.About { html = node.InnerHtml }; } //todo: look into whether this else statement is necessary else { about = null; } } return this.Jsonp(about); } 
+6
source share
4 answers

I had to make a small adjustment to my originally published code

  public JsonpResult About(string HomePageUrl) { Models.Pocos.About about = null; // ************* CHANGE HERE - added "timeout in milliseconds" to RemoteFileExists extension method. if (HomePageUrl.RemoteFileExists(1000)) { // Using the Html Agility Pack, we want to extract only the // appropriate data from the remote page. HtmlWeb hw = new HtmlWeb(); HtmlDocument doc = hw.Load(HomePageUrl); HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']"); if (node != null) { about = new Models.Pocos.About { html = node.InnerHtml }; } //todo: look into whether this else statement is necessary else { about = null; } } return this.Jsonp(about); } 

Then I changed my RemoteFileExists extension method to have a timeout

  public static bool RemoteFileExists(this string url, int timeout) { try { //Creating the HttpWebRequest HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest; // ************ ADDED HERE // timeout the request after x milliseconds request.Timeout = timeout; // ************ //Setting the Request method HEAD, you can also use GET too. request.Method = "HEAD"; //Getting the Web Response. HttpWebResponse response = request.GetResponse() as HttpWebResponse; //Returns TRUE if the Status code == 200 return (response.StatusCode == HttpStatusCode.OK); } catch { //Any exception will returns false. return false; } } 

In this approach, if my timeout fires before RemoteFileExists , you can determine the response of the header, then my bool will return false.

+1
source

Retrieve your url page using this method:

 private static string retrieveData(string url) { // used to build entire input StringBuilder sb = new StringBuilder(); // used on each read operation byte[] buf = new byte[8192]; // prepare the web page we will be asking for HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url); request.Timeout = 10; //10 millisecond // execute the request HttpWebResponse response = (HttpWebResponse) request.GetResponse(); // we will read data via the response stream Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; do { // fill the buffer with data count = resStream.Read(buf, 0, buf.Length); // make sure we read some data if (count != 0) { // translate from bytes to ASCII text tempString = Encoding.ASCII.GetString(buf, 0, count); // continue building the string sb.Append(tempString); } } while (count > 0); // any more data to read? return sb.ToString(); } 

And use the HTML flexibility package and extract the html tag as follows:

 public static string htmlRetrieveInfo() { string htmlSource = retrieveData("http://example.com/test.html"); HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(htmlSource); if (doc.DocumentNode.SelectSingleNode("//body") != null) { HtmlNode node = doc.DocumentNode.SelectSingleNode("//body"); } return node.InnerHtml; } 
+5
source

Html Agility Pack is an open sauce. That is why you can change the source code. To first add this code to the HtmlWeb class:

 private int _timeout = 20000; public int Timeout { get { return _timeout; } set { if (_timeout < 1) throw new ArgumentException("Timeout must be greater then zero."); _timeout = value; } } 

Then find this method

 private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) 

and change it:

 req = WebRequest.Create(uri) as HttpWebRequest; req.Method = method; req.UserAgent = UserAgent; req.Timeout = Timeout; //add this 

Or something like that:

 htmlWeb.PreRequest = request => { request.Timeout = 15000; return true; }; 
+5
source

You can use the standard HttpWebRequest to retrieve the remote resource and set the Timeout property. Then download the resulting HTML code if it manages to use the HTML Agility Pack for parsing.

0
source

Source: https://habr.com/ru/post/891973/


All Articles