Select all links from an Html table using XPath (and HtmlAgilityPack)

Question

Select all links from an Html table using XPath (and HtmlAgilityPack)

I am trying to extract all links with an href attribute that starts with http: //, https: // or /. These links are inside the table (tbody> tr> td, etc.) with a specific class. I thought I could only specify the a element without the whole path to it, but it does not seem to work. I get a NullReferenceException in a string that selects links:

var table = doc.DocumentNode.SelectSingleNode("//table[@class='containerTable']");
if (table != null)
{
    foreach (HtmlNode item in table.SelectNodes("a[starts-with(@href, 'https://')]"))
    {
        //not working

I don't know about any recommendations or recommendations when it comes to XPath. Am I creating overhead when I request a document twice?

+3

c # xpath html-agility-pack

Adam asham Mar 20 '10 at 22:11

source share

2 answers

, , , . tr td.

, xpath , :

"tbody/tr/td/a[starts-with(@href, 'https://')]"

, - , node (.. ):

"//a[starts-with(@href, 'https://')]"

xpath . this.

+2

Oded 20 . '10 22:28

Dimitre Novatchev · Accepted Answer · 2010-03-21T04:37:28+0000

Using

 //tbody/descendant::a[starts-with(@href,'https://')
                     or
                       starts-with(@href,'http://')
                     or
                       starts-with(@href,'./') 
                      ]

, , , XmlNode.SelectNodes() XmlNodeList, HtmlNode.

Select all links from an Html table using XPath (and HtmlAgilityPack)

More articles: