Current documentation says:
Determines whether to close closed nodes at the end or directly in the document. Setting this value to true can actually change the way browsers render the page. The default value is false.
Sorry, I have to admit that I do not understand this paragraph. In particular, "at the end" of what? And what does "in the document" mean? The phrase sounds ominous to the last. If the parameter is set to true, and if the html is formatted correctly, will it still affect the document?
I looked in the source code, but I did not understand what was happening - the code reacts to the fact that the property is not set to true. See HtmlNode.cs and search for OptionAutoCloseOnEnd - line 1707. I also found some funky code in HtmlWeb.cs on lines 1113 and 1154. It is too bad that the source browser does not display line numbers, but searches for OptionAutoCloseOnEnd on the page.
Could you illustrate with an example what this option does?
I use HtmlAgilityPack to fix bad html and to export page content to xml.
I came across some poorly formatted html overlapping tags. Here is a snippet:
<p>Blah bah <P><STRONG>Some Text</STRONG><STRONG></p> <UL> <LI></STRONG>Item 1.</LI> <LI>Item 2</LI> <LI>Item 3</LI></UL>
Note that the first p tag is not closed and note the overlapping STRONG tag.
If I installed OptionAutoCloseOnEnd, it will be fixed somehow. I am trying to understand what exactly is the effect of setting this property as a whole in the structure of the document.
Here is the C # code I'm using:
HtmlDocument doc = new HtmlDocument(); doc.OptionOutputAsXml = true; doc.OptionFixNestedTags = true; // doc.OptionAutoCloseOnEnd = true; doc.LoadHtml(htmlText);
Thanks!
source share