I tried to parse HTML using HtmlAgilityPack as follows:
HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(xhtmlString);
Unfortunately, xhtmlString contains unnecessary spaces and newlines, so _text htmlDoc now looks like this:
<html xmlns=\"http://www.w3.org/1999/xhtml\">\n\t<head></head>\n\t<body>\n\n<p>Alle Auktionen<br /></p>\n\n\t</body>\n</html>
This is a problem for me when working with child elements of the body.
What is the easiest way to remove these extra characters?
Does HtmlAgilityPack offer some kind of function for clearing HTML from new lines and tabs?
source share