Getting the value of JavaScript / HTML variables in C #

There is a webpage I'm trying to extract data from. After looking at the HTML on the Source page, I can find the data that interests me in the script tags. It looks like this:

<html> <script type="text/javascript"> window.gon = {}; gon.default_profile_mode = false; gon.user = null; gon.product = "shoes"; gon.books_jsonarray = [ { "title": "Little Sun", "authors": [ "John Smith" ], edition: 2, year: 2009 }, { "title": "Little Prairie", "authors": [ "John Smith" ], edition: 3, year: 2009 }, { "title": "Little World", "authors": [ "John Smith", "Mary Neil", "Carla Brummer" ], edition: 3, year: 2014 } ]; </script> </html> 

What I would like to achieve is to invoke a webpage using its url and then extract the gon variable from JavaScript and save it in a C # variable. In other words, in C #, I would like to have a data structure (like a dictionary) that will contain the value "gon".

I tried to figure out how to get a variable defined in JavaScript via C # WebBrowser, and here is what I found:

 using System; using System.Collections.Generic; using System.Windows.Forms; using System.Net; using System.Runtime.InteropServices; using System.Text.RegularExpressions; using mshtml; namespace Mynamespace { public partial class Form1 : Form { public WebBrowser WebBrowser1 = new WebBrowser(); private void Form1_Load(object sender, EventArgs e) { string myurl = "http://somewebsite.com"; //Using WebBrowser control to load web page this.WebBrowser1.Navigate(myurl); } private void btnGetValueFromJs_Click(object sender, EventArgs e) { var mydoc = this.WebBrowser1.Document; IHTMLDocument2 vDocument = mydoc.DomDocument as IHTMLDocument2; IHTMLWindow2 vWindow = (IHTMLWindow2)vDocument.parentWindow; Type vWindowType = vWindow.GetType(); object strfromJS = vWindowType.InvokeMember("mystr", BindingFlags.GetProperty, null, vWindow, new object[] { }); //Here, I am able to see the string "Hello Sir" object gonfromJS = vWindowType.InvokeMember("gon", BindingFlags.GetProperty, null, vWindow, new object[] { }); //Here, I am able to see the object gonfromJS as a '{System.__ComObject}' object gonbooksfromJS = vWindowType.InvokeMember("gon.books_jsonarray", BindingFlags.GetProperty, null, vWindow, new object[] { }); //This error is thrown: 'An unhandled exception of type 'System.Runtime.InteropServices.COMException' occurred in mscorlib.dll; (Exception from HRESULT: 0x80020006 (DISP_E_UNKNOWNNAME))' } } } 

I can get the values ​​of strings or numeric variables, such as:

 var mystr = "Hello Sir"; var mynbr = 8; 

However, although I can see that the variable "gon" is passed as "{System .__ ComObject}", I don’t know how to parse it to see the values ​​of its subcomponents, It would be nice if I could parse it, but if no, then I would like it to be a C # data structure with keys / values ​​that contains all the auxiliary data for the gon variable, and especially the ability to view the variable 'gon.books_jsonarray'.

Any help on how to achieve this would be greatly appreciated. Please note that I can’t change the source html / javascript in any way, and therefore I need C # code that will achieve my goal.

+5
source share
2 answers
  • You need to use JSON.stringify to convert the gon.books_jsonarray variable to a JSON string

  • After extracting JSON using the following C# code:

    var gonFromJS = mydoc.InvokeScript ("eval", new object [] {"JSON.stringify (gon.books_jsonarray)"}). ToString ();

  • After deserializing JSON to an object using Newtonsoft.Json

My complete code is here:

 using Newtonsoft.Json; using System; using System.Collections.Generic; using System.Windows.Forms; namespace WindowsFormsApp1 { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void Form1_Load(object sender, EventArgs e) { var webBrowser = new WebBrowser(); webBrowser.DocumentCompleted += (s, ea) => { var mydoc = webBrowser.Document; var gonFromJS = mydoc.InvokeScript("eval", new object[] { "JSON.stringify(gon.books_jsonarray)" }).ToString(); var gonObject = JsonConvert.DeserializeObject<List<Books>>(gonFromJS); }; var myurl = "http://localhost/test.html"; webBrowser.Navigate(myurl); } private class Books { public string Title { get; set; } public List<string> Authors { get; set; } public int Edition { get; set; } public int Year { get; set; } } } } 

You can also see the output in the screenshot: enter image description here

EDIT

You may also encounter the JSON.stringify method.

It can return null .

In this case, you can view SO topics: here and here .

If the JSON.stringify method returns null, try adding the following code to your HTML page:

 <head> <meta http-equiv='X-UA-Compatible' content='IE=edge' > </head> 
+1
source

You can convert the result of InvokeMember () to dynamic and use property names directly in C # code. Indexing arrays is complicated, but can be done using another use of InvokeScript (), see my example:

 private void btnGetValueFromJs_Click(object sender, EventArgs e) { var mydoc = this.WebBrowser1.Document; IHTMLDocument2 vDocument = mydoc.DomDocument as IHTMLDocument2; IHTMLWindow2 vWindow = (IHTMLWindow2)vDocument.parentWindow; Type vWindowType = vWindow.GetType(); var gonfromJS = (dynamic)vWindowType.InvokeMember("gon", BindingFlags.GetProperty, null, vWindow, new object[] { }); var length = gonfromJS.books_jsonarray.length; for (var i = 0; i < length; ++i) { var book = (dynamic) mydoc.InvokeScript("eval", new object[] { "gon.books_jsonarray[" + i + "]" }); Console.WriteLine(book.title); /* prints: * Little Sun * Little Prairie * Little World */ } } 
0
source

Source: https://habr.com/ru/post/1274937/


All Articles