Select all links from an Html table using XPath (and HtmlAgilityPack)

I am trying to extract all links with an href attribute that starts with http: //, https: // or /. These links are inside the table (tbody> tr> td, etc.) with a specific class. I thought I could only specify the a element without the whole path to it, but it does not seem to work. I get a NullReferenceException in a string that selects links:

var table = doc.DocumentNode.SelectSingleNode("//table[@class='containerTable']");
if (table != null)
{
    foreach (HtmlNode item in table.SelectNodes("a[starts-with(@href, 'https://')]"))
    {
        //not working

I don't know about any recommendations or recommendations when it comes to XPath. Am I creating overhead when I request a document twice?

+3
source share
2 answers

Using

 //tbody/descendant::a[starts-with(@href,'https://')
                     or
                       starts-with(@href,'http://')
                     or
                       starts-with(@href,'./') 
                      ]

, , , XmlNode.SelectNodes() XmlNodeList, HtmlNode.

+3

, , , . tr td.

, xpath , :

"tbody/tr/td/a[starts-with(@href, 'https://')]"

, - , node (.. ):

"//a[starts-with(@href, 'https://')]"

xpath . this.

+2

Source: https://habr.com/ru/post/1737857/


All Articles