Parsing XHTML with DTD using XDocument

Question

Parsing XHTML with DTD using XDocument

I need to get plain text from XHTML documents.

I'm sure I read somewhere that XDocument on WP7 does not support DTD. I can not find him. Well, when I try to parse XHTML with DTD using XDocument, it throws a NotSuportedException. The last call to stacktrace is System.Xml.XmlTextReaderImpl.ParseDoctypeDecl() .

This is exactly the same, even if I try to use some kind of dummy XmlResolver - it is not actually called. (next answer to this question ).

Therefore, I assume that WP7 does not really support it.

Well, I need to parse XHTML documents. So far I have come up with two (more or less real) solutions:
I can do this if I remove this DTD ad. But in XHTML there may be some kind of symbolic entity, and then an exception is thrown if this symbol object is not one of the predefined XML objects.
Thus, the solution only works for some XHTML.

I was thinking about using regex. It is very easy to remove all html tags, but the "entity" problem remains the one that I do not think is a real / good replacement solution for all objects.

Has anyone come across / solved this? Can you give me advice or correct me if I am wrong? Thanks.

+2

xhtml windows-phone-7 linq-to-xml dtd

jumbo Mar 15 '11 at 18:09

source share

1 answer

Robert · Accepted Answer · 2011-03-15T19:31:34+0000

HTML Agility pack is a library for parsing an html document, as stated on the forum, it has a version for WP7

http://htmlagilitypack.codeplex.com/discussions/225113

Parsing XHTML with DTD using XDocument

More articles: