I am trying to extract all links with an href attribute that starts with http: //, https: // or /. These links are inside the table (tbody> tr> td, etc.) with a specific class. I thought I could only specify the a element without the whole path to it, but it does not seem to work. I get a NullReferenceException in a string that selects links:
var table = doc.DocumentNode.SelectSingleNode("//table[@class='containerTable']");
if (table != null)
{
foreach (HtmlNode item in table.SelectNodes("a[starts-with(@href, 'https://')]"))
{
I don't know about any recommendations or recommendations when it comes to XPath. Am I creating overhead when I request a document twice?
source
share