Hi

C #, HTML Agility, selecting each paragraph in a div tag

How can I select each paragraph in a div tag, for example.

<div id="body_text"> <p>Hi</p> <p>Help Me Please</P> <p>Thankyou</P> 

I have Html Agility downloaded and mentioned in my program, all I need is paragraphs. There may be a variable number of paragraphs, and there are many different div tags, but I only need the content in body_text. Then I assume that this can be saved as a string, which I then want to write to a .txt file for later reference. Thankyou.

+4
source share
2 answers

The actual XPATH for your case is //div[@id='body_text']/p

 foreach(HtmlNode node in yourHTMLAgilityPackDocument.DocumentNode.SelectNodes("//div[@id='body_text']/p") { string text = node.InnerText; //that the text you are looking for } 
+3
source

Here is a solution that captures paragraphs as an enumeration of HtmlNodes:

 HtmlDocument doc = new HtmlDocument(); doc.Load("your.html"); var div = doc.GetElementbyId("body_text"); var paragraphs = div.ChildNodes.Where(item => item.Name == "p"); 

Without explicit Linq:

 var paragraphs = doc.GetElementbyId("body_text").Elements("p"); 
+1
source

Source: https://habr.com/ru/post/1336171/


All Articles