Import data from HTML table into DataTable in C #

I wanted to import some data from an HTML table (here is the link http://road2paris.com/wp-content/themes/roadtoparis/api/generated_table_august.html ) and display the first 16 people in the DataGridView in my Form application. From what I read, the best way to do this is to use the Agility HTML package, so I downloaded it and included it in my project. I understand that the first thing to do is load the contents of the HTML file. Here is the code I used for this:

string htmlCode = ""; using (WebClient client = new WebClient()) { client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError"); htmlCode = client.DownloadString("http://road2paris.com/wp-content/themes/roadtoparis/api/generated_table_august.html"); } HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(htmlCode); 

And then I got stuck. I do not know how to populate my data with data from an HTML table. I tried many different solutions, but nothing seems to work properly. I would be happy if anyone could help me with this.

+6
source share
2 answers
 HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(htmlCode); var headers = doc.DocumentNode.SelectNodes("//tr/th"); DataTable table = new DataTable(); foreach (HtmlNode header in headers) table.Columns.Add(header.InnerText); // create columns from th // select rows with td elements foreach (var row in doc.DocumentNode.SelectNodes("//tr[td]")) table.Rows.Add(row.SelectNodes("td").Select(td => td.InnerText).ToArray()); 

To use this code, you need the HTML Agility Pack library.

+14
source

Below I have created code that avoids duplication of data headers. When creating a data table, each column must have a unique name. In addition, there are times when an HTML row may go out of bounds, and you need to add additional columns to the data table, otherwise you will drop the data. that was my decision.

 ''' public enum DuplicateHeaderReplacementStrategy { AppendAlpha, AppendNumeric, Delete } public class HtmlServices { private static readonly string[] Alpha = new[] { "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z" }; public static HtmlDocument RenameDuplicateHeaders(HtmlDocument doc, DuplicateHeaderReplacementStrategy strategy) { var index = 0; try { foreach (HtmlNode table in doc.DocumentNode?.SelectNodes("//table")) { var tableHeaders = table.SelectNodes("th")? .GroupBy(x => x)? .Where(g => g.Count() > 1)? .ToList(); tableHeaders?.ForEach(y => { switch (strategy) { case DuplicateHeaderReplacementStrategy.AppendNumeric: y.Key.InnerHtml += index++; break; case DuplicateHeaderReplacementStrategy.AppendAlpha: y.Key.InnerHtml += Alpha[index++]; break; case DuplicateHeaderReplacementStrategy.Delete: y.Key.InnerHtml = string.Empty; break; } }); } return doc; } catch { return doc; } } } public static DataTable GetDataTableFromHtmlTable(string url, string[] htmlIds) { ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12; HtmlWeb web = new HtmlWeb(); HtmlDocument doc = web.Load(url); string html = doc.DocumentNode.OuterHtml; doc = HtmlServices.RenameDuplicateHeaders(doc, DuplicateHeaderReplacementStrategy.AppendNumeric); var headers = doc.DocumentNode.SelectNodes("//tr/th"); DataTable table = new DataTable(); foreach (HtmlNode header in headers) if (!table.ColumnExists(header.InnerText)) { table.Columns.Add(header.InnerText); // create columns from th } else { int columnIteration = 0; while (table.ColumnExists(header.InnerText + columnIteration.ToString())) { columnIteration++; } table.Columns.Add(header.InnerText + columnIteration.ToString()); // create columns from th } // select rows with td elements foreach (var row in doc.DocumentNode.SelectNodes("//tr[td]")) { var addRow = row.SelectNodes("td").Select(td => td.InnerHtml.StripHtmlTables()).ToArray(); if (addRow.Count() > table.Columns.Count) { int m_numberOfRowsToAdd = addRow.Count() - table.Columns.Count; for (int i = 0; i < m_numberOfRowsToAdd; i++) table.Columns.Add($"ExtraColumn {i + 1}"); } try { table.Rows.Add(addRow); } catch (Exception e) { debug.Print(e.Message); } } return table; } 
0
source

Source: https://habr.com/ru/post/1495617/


All Articles