Since I'm still not familiar with XPath, I prefer LINQ to HtmlAgilityPack. I think this is one of those times when I need an XPath solution. So I need your help.
Consider this simplified HTML snippet:
<td><b>Billing informations:</b>
<table>
<tr>
<td style="color: #757575; padding-left: 10px; padding-bottom: 20px;">
Invoice-Number:1534753<br />Transactioncode: 1WF772582A4041717
</td>
</tr>
</table>
</td>
This is part of a larger HTML page, but it demonstrates the problem that I have. I need to extract Invoice-Numberand TransactionCode. Sometimes the text is in the gap, and sometimes directly in the cell, as here. Therefore, I need a way that works in both cases.
I tried this:
var invoiceCell = doc.DocumentNode.Descendants("td")
.FirstOrDefault(cell => cell.InnerText.Contains("Invoice-Number"));
if (invoiceCell != null)
{
string text = invoiceCell.InnerText;
}
The problem is that it invoiceCell.InnerTextreturns the most distant cell InnerText, not the cell containing Invoice-Number. Therefore, textit also contains “Billing Information”:
Billing informations:
Invoice-Number:1534753Transactioncode: 1WF772582A4041818
, , html . InnerText . , LINQ , .
, LastOrDefault FirstOrDefault , , , , :
var invoiceCell = doc.DocumentNode.Descendants("td")
.LastOrDefault(cell => cell.InnerText.Contains("Invoice-Number"));