If the target site uses AJAX heavily (as Youtube does), it is difficult, if not impossible, to determine when the page has finished loading and running all the dynamic scripts. But you can get closer to handling the window.onload
event and provide extra seconds or two for non-deterministic AJAX calls. Then call webBrowser.Document.DomDocument.documentElement.outerHTML
via dynamic
to get the HTML displayed.
Example:
private void Form1_Load(object sender, EventArgs e) { DownloadAsync("http://www.example.com").ContinueWith( (task) => MessageBox.Show(task.Result), TaskScheduler.FromCurrentSynchronizationContext()); } async Task<string> DownloadAsync(string url) { TaskCompletionSource<bool> onloadTcs = new TaskCompletionSource<bool>(); WebBrowserDocumentCompletedEventHandler handler = null; handler = delegate { this.webBrowser.DocumentCompleted -= handler; // attach to subscribe to DOM onload event this.webBrowser.Document.Window.AttachEventHandler("onload", delegate { // each navigation has its own TaskCompletionSource if (onloadTcs.Task.IsCompleted) return; // this should not be happening // signal the completion of the page loading onloadTcs.SetResult(true); }); }; // register DocumentCompleted handler this.webBrowser.DocumentCompleted += handler; // Navigate to url this.webBrowser.Navigate(url); // continue upon onload await onloadTcs.Task; // artificial delay for AJAX await Task.Delay(1000); // the document has been fully loaded, can access DOM here return ((dynamic)this.webBrowser.Document.DomDocument).documentElement.outerHTML; }
[EDITED] Here is the last piece of code that helped solve the OP problem:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(((dynamic)this.webBrowser1.Document.DomDocument).documentElement.ouββterHTML);
source share