Get current WebBrowser DOM as HTML

I want to use the HTML feature pack in WebBrowser that has downloaded everything I need (it presses a button with a code to upload each video to a channel) (It uploads a YouTube channel and then uploads all the videos on the specified channel.) Now, if I try get all the details of the video (I have a working code that gets the first 30 videos of the channel to the list), it will still show only the first 30, but I have all the videos uploaded on the WebBrowser page (It shows all the videos) I use this to get what is currently loaded from W ebBrowser

enter image description here

but it still only downloads the first 30 videos, not all the videos downloaded from WebBrowser.

+4
source share
1 answer

If the target site uses AJAX heavily (as Youtube does), it is difficult, if not impossible, to determine when the page has finished loading and running all the dynamic scripts. But you can get closer to handling the window.onload event and provide extra seconds or two for non-deterministic AJAX calls. Then call webBrowser.Document.DomDocument.documentElement.outerHTML via dynamic to get the HTML displayed.

Example:

 private void Form1_Load(object sender, EventArgs e) { DownloadAsync("http://www.example.com").ContinueWith( (task) => MessageBox.Show(task.Result), TaskScheduler.FromCurrentSynchronizationContext()); } async Task<string> DownloadAsync(string url) { TaskCompletionSource<bool> onloadTcs = new TaskCompletionSource<bool>(); WebBrowserDocumentCompletedEventHandler handler = null; handler = delegate { this.webBrowser.DocumentCompleted -= handler; // attach to subscribe to DOM onload event this.webBrowser.Document.Window.AttachEventHandler("onload", delegate { // each navigation has its own TaskCompletionSource if (onloadTcs.Task.IsCompleted) return; // this should not be happening // signal the completion of the page loading onloadTcs.SetResult(true); }); }; // register DocumentCompleted handler this.webBrowser.DocumentCompleted += handler; // Navigate to url this.webBrowser.Navigate(url); // continue upon onload await onloadTcs.Task; // artificial delay for AJAX await Task.Delay(1000); // the document has been fully loaded, can access DOM here return ((dynamic)this.webBrowser.Document.DomDocument).documentElement.outerHTML; } 

[EDITED] Here is the last piece of code that helped solve the OP problem:

 HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(((dynamic)this.webBrowser1.Document.DomDocument).documentElement.ouβ€Œβ€‹terHTML); 
+5
source

Source: https://habr.com/ru/post/1502229/


All Articles