How to parse text from an anonymous block in AngleSharp?

I am parsing the contents of the site using AngleSharp and I have a problem with an anonymous block.

See sample code:

var parser = new HtmlParser();
var document = parser.Parse(@"<body>
<div class='product'>
    <a href='#'><img src='img1.jpg' alt=''></a>
    Hello, world
    <div class='comments-likes'>1</div>
</div>
<div class='product'>
    <a href='#'><img src='img2.jpg' alt=''></a>
    Yet another helloworld
    <div class='comments-likes'>25</div>
</div>
<body>");

var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
    var productTitle = product.Text();
    productTitle.Dump();
}

So productTitle contains the numbers from div.comments-likes, the output is:

Hello world 1

Another helloworld 25

I tried something like product.FirstElementChild.NextElementSibling.Text();, but the next sibling for the link element is div.comments-like, not an anonymous block. He shows:

1

25

So, anonymous blocks are skipped.: (

The best workaround I found is to remove all the prevent blocks, for my example:

product.QuerySelector(".comments-likes").Remove();
var productTitle = product.Text().Trim();

Best way to parse text from an anonymous block?

+4
source share
1 answer

TextNode, node , node, .. NextElementSibling, , , , .

, div, div ChildNodes, NodeType, :

var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
    var productTitle = product.ChildNodes
                              .First(o => o.NodeType == AngleSharp.Dom.NodeType.Text 
                                            && o.TextContent.Trim() != "");
    Console.WriteLine(productTitle.TextContent.Trim());
}

dotnetfiddle demo

, , .

+2

Source: https://habr.com/ru/post/1672055/


All Articles