You can use the InnerText body:
string html = @" <html> <title>title</title> <body> <h1> This is a big title.</h1> How are doing you? <h3> I am fine </h3> <img src=""abc.jpg""/> </body> </html>"; HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); string text = doc.DocumentNode.SelectSingleNode("//body").InnerText;
Further you can collapse places and new lines:
text = Regex.Replace(text, @"\s+", " ").Trim();
Note, however, that while it works in this case, markup, such as hello<br>world or hello<i>world</i> , will be converted using InnerText to helloworld - tag removal. It is difficult to solve this problem, because the display depends on CSS, and not just on the markup.
source share