Remove spaces and newlines when parsing with HtmlAgilityPack

Question

Remove spaces and newlines when parsing with HtmlAgilityPack

I tried to parse HTML using HtmlAgilityPack as follows:

HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(xhtmlString);

Unfortunately, xhtmlString contains unnecessary spaces and newlines, so _text htmlDoc now looks like this:

<html xmlns=\"http://www.w3.org/1999/xhtml\">\n\t<head></head>\n\t<body>\n\n<p>Alle Auktionen<br /></p>\n\n\t</body>\n</html>

This is a problem for me when working with child elements of the body.

What is the easiest way to remove these extra characters?

Does HtmlAgilityPack offer some kind of function for clearing HTML from new lines and tabs?

+6

c # asp.net trim html-agility-pack

magnattic Jan 05 '12 at 13:30

source share

1 answer

m.rufca · Answer 1 · 2012-01-05T13:57:25+0000

This is the indentation of the document, not the unnecessary spaces and newlines.
I can't figure out how this could be a problem, but maybe you just replace special characters like "\ t", "\ n"?

Performing a quick search, I found this Html Agility Pack: make the code convenient.
It may be useful to set some properties to false.

Remove spaces and newlines when parsing with HtmlAgilityPack

More articles: