Get only InnerText of this node excluding children

Since I'm still not familiar with XPath, I prefer LINQ to HtmlAgilityPack. I think this is one of those times when I need an XPath solution. So I need your help.

Consider this simplified HTML snippet:

<td><b>Billing informations:</b>
    <table>
        <tr>
            <td style="color: #757575; padding-left: 10px; padding-bottom: 20px;">
                Invoice-Number:1534753<br />Transactioncode: 1WF772582A4041717
            </td>
        </tr>
    </table>
</td>

This is part of a larger HTML page, but it demonstrates the problem that I have. I need to extract Invoice-Numberand TransactionCode. Sometimes the text is in the gap, and sometimes directly in the cell, as here. Therefore, I need a way that works in both cases.

I tried this:

var invoiceCell = doc.DocumentNode.Descendants("td")
    .FirstOrDefault(cell => cell.InnerText.Contains("Invoice-Number"));
if (invoiceCell != null)
{
    string text = invoiceCell.InnerText;
    // use string methods to extract both values
}

The problem is that it invoiceCell.InnerTextreturns the most distant cell InnerText, not the cell containing Invoice-Number. Therefore, textit also contains “Billing Information”:

Billing informations:



                Invoice-Number:1534753Transactioncode: 1WF772582A4041818

, , html . InnerText . , LINQ , .

, LastOrDefault FirstOrDefault , , , , :

var invoiceCell = doc.DocumentNode.Descendants("td")
    .LastOrDefault(cell => cell.InnerText.Contains("Invoice-Number"));
+4
1

, XPath - :

var xpath = "//td[contains(text(),'Invoice-Number') or contains(span,'Invoice-Number')]";
var invoiceCell = doc.DocumentNode.SelectSingleNode(xpath);
+1

Source: https://habr.com/ru/post/1616187/


All Articles