I have an html document and I want to remove all divs from a specific class (with all content). What is the easiest way to do this?
Thank you for your help.
UPDATED:
I tried the Html Agility Pack, as you advised, but I could not reach the goal. I have the following code
static void Main() { HtmlDocument document = new HtmlDocument(); document.Load(FileName); HtmlNode node = document.DocumentNode; HandleNode(node); }
private static void HandleNode(HtmlNode node) { while (node != null) { if (node.Name == "div") { var attribute = node.Attributes.Where(x => x.Name == "class" && x.Value == "NavContent"); if (attribute.Any()) node.Remove(); } foreach (var childNode in node.ChildNodes) { HandleNode(childNode); } } }
code> But I do not want this. Recursion never ends, and the name node is always a comment. Here's the htmp document I'm trying to parse:
http://en.wiktionary.org/wiki/work Is there a good example of how to work with the Html Agility Pack? What is wrong with this piece of code?
source share