Is there a way to tell the C # WebBrowser component not to modify the original HTML?

I noticed that Internet Explorer, which is used by the web browser component, changes the source code of the source code. I found that some of the code should change due to ajax requests and all. When I said that the html was changed, I refer to those tags that are inserted, even if they are not in the actual source code. For example, IE ends the body tag when it is absent, also the "tbody" tag when the table does not have it. Is there a way to keep the original structure of the document?

If you do not understand the question, let me know, thanks.

+4
source share
1 answer

I had to make some assumptions, but here is my theory:

What you see is not IE modifying HTML. I do not know how you saw this, but it is a serialization of the IE DOM tree. Of course, IE must close the body tag (or create a body DOM element, if we need to be precise) in order to do something. A serialized DOM is not what the original HTML was, and if you want to extract the original HTML, you probably aren't using the right tool. This behavior sometimes occurs in WSIYWYG editors and when using the save page in IE. It simply returns its internal DOM tree back to the string, and there are no incomplete elements in the DOM tree, because they are nodes of the tree, not tags.

+1
source

Source: https://habr.com/ru/post/1403086/


All Articles