Loading XML or XHTML content with html encoded or escaped characters

Question

Loading XML or XHTML content with html encoded or escaped characters

I am developing a class for a content management system. Input content is provided in XHTML format. And it may contain valid escaped characters, such as £See Example below.

<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml">
  <head xmlns="">
    <meta name="Attr_DocumentTitle" content="Hello World Books" />
   </head>
  <body>

 <div>British Pound   &#163;</div>

 <div>Registered sign &#174;</div>

 <div>Copyright sign &#169; </div>

  </body>
</html>

My goal is to write a method that loads this into an XML.Net object, process and save it in a database. I want to keep the screened characters as they are. And here is my method:

public static XmlDocument LoadXmlFromString(string xhtmlContent)
{
    byte[] xhtmlByte = Encoding.ASCII.GetBytes(xhtmlContent);
    MemoryStream mStream = new MemoryStream(xhtmlByte);
    XmlReaderSettings settings = new XmlReaderSettings();
    //Upon loading XML, prevent DTD download, which would be blocked by our 
    //firewall and generate "503 Server Unavailable" error.
    settings.XmlResolver = null;
    settings.ProhibitDtd = false;
    XmlReader reader = XmlReader.Create(mStream, settings);
    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.LoadXml(xhtmlContent);
    return xmlDoc; //Value of xmlDoc.InnerXml contains £ ® © in place 
                    // of &#163; &#174; and &#169;
}

This method, however, converts escaped characters to their character equivalents. How can I avoid this and keep escaped characters.

+3

c # xml .net linq-to-xml c # -4.0

Cleancoder Dec 20 '10 at 18:39

source share

1

Maxim Gueivandov · Accepted Answer · 2011-01-13T23:25:26+0000

: xmltextreader html utf8 utf8

Loading XML or XHTML content with html encoded or escaped characters

More articles: