How can I get all the content in the <td> tag using the Agility Pack?
So, I am writing an application that will do a little screen cleansing. I use the HTML Agility Pack to load an entire HTML page into an instance HtmlDocoumentcalled doc. Now I want to analyze this document looking for this:
<table border="0" cellspacing="3">
<tr><td>First rows stuff</td></tr>
<tr>
<td>
The data I want is in here <br />
and it seperated by these annoying <br /> 's.
No id's, classes, or even a single <p> tag. </p> Just a bunch of <br /> tags.
</td>
</tr>
</table>
So I just need to get the data in the second row. How can i do this? Should I use regex or something else?
Update: This is how I downloaddoc
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(Url);
Html Agility Pack, , , . , XPath. - :
HtmlDocument doc = new HtmlDocument();
doc.Load("input.html");
HtmlNode node = doc.DocumentNode
.SelectNodes("//table[@cellspacing='3']/tr[2]/td")
.Single();
string text = node.InnerText;
If you are already using the Agility package, then it’s just a matter of using any thing doc.DocumentNode.SelectNodes("//table[@cellspacing='3']")to get the table in the document. Try looking at sample documentation and coding. Since you already have structured data, it is ridiculous to go back to text data and repeat it.