How to get the following 2 nodes in HTML + HTMLAgilitypack

I have a table in the HTML below:

<table style="padding: 0px; border-collapse: collapse;">
    <tr>
        <td><h3>My Regional Financial Office</h3></td>
    </tr>
    <tr>
        <td>&#160;</td>
    </tr>
    <tr>
        <td><h3>My Address</h3></td>
    </tr>
    <tr>
        <td>000 Test Ave S Ste 000</td>
    </tr>
    <tr>
        <td>Golden Valley, MN 00000</td>
    </tr>
    <tr>
        <td><a href="javascript:submitForm('0000','0000000');">Get Directions</a></td>
    </tr>
    <tr>
        <td>&#160;</td>
    </tr>
</table>

How to get the inner text of the following tags 2 <tr>after tablets containing the text "My address?"

+1
source share
1 answer

You can use the following XPath:

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var tdOfInterests = 
        htmlDoc.DocumentNode
               .SelectNodes("//tr[td/h3[.='My Address']]/following-sibling::tr[position() <= 2]/td");
foreach (HtmlNode td in tdOfInterests)
{
    //given html input in question following code will print following 2 lines:
    //000 Test Ave S Ste 000
    //Golden Valley, MN 00000
    Console.WriteLine(td.InnerText);
}

The key above XPath uses following-siblingwith a filter position().

UPDATE:

A bit of XPath explanation used in this answer:

//tr[td/h3[.='My Address']]

above select an item <tr>that has:

  • a child <td>that has a child <h3>with a value of 'My Address'

/following-sibling::tr[position() <= 2]

<tr> <= 2 <tr> (, XPath)

/td

<td> <tr>

0

Source: https://habr.com/ru/post/1676151/


All Articles