I am parsing the contents of the site using AngleSharp and I have a problem with an anonymous block.
See sample code:
var parser = new HtmlParser();
var document = parser.Parse(@"<body>
<div class='product'>
<a href='#'><img src='img1.jpg' alt=''></a>
Hello, world
<div class='comments-likes'>1</div>
</div>
<div class='product'>
<a href='#'><img src='img2.jpg' alt=''></a>
Yet another helloworld
<div class='comments-likes'>25</div>
</div>
<body>");
var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
var productTitle = product.Text();
productTitle.Dump();
}
So productTitle contains the numbers from div.comments-likes, the output is:
Hello world 1
Another helloworld 25
I tried something like product.FirstElementChild.NextElementSibling.Text();, but the next sibling for the link element is div.comments-like, not an anonymous block. He shows:
1
25
So, anonymous blocks are skipped.: (
The best workaround I found is to remove all the prevent blocks, for my example:
product.QuerySelector(".comments-likes").Remove();
var productTitle = product.Text().Trim();
Best way to parse text from an anonymous block?
source
share