Parsing an XML string containing "& # x20;" (which must be preserved)

I have a code that is passed a string containing XML. This XML may contain one or more instances of   (object reference for the space character). I have a requirement that these links not be resolved (i.e. they should not be replaced with the actual space character).

Is there any way to achieve this?

Basically, for a string containing XML:

 <pattern value="[A-Z0-9&#x20;]" /> 

I do not have it converted to:

 <pattern value="[A-Z0-9 ]" /> 

(What I'm actually trying to achieve is simply to take an XML string and write it to a “pretty printed” file. This has the side effect of allowing &#x20; to be &#x20; in the string for one space, which needs to be saved. The reason for this requirement is that a written XML document must comply with a specification defined externally.)

I tried to subclass the XmlTextReader class to read from an XML string and override the ResolveEntity() method, but this is not called. I also tried to assign a custom XmlResolver .

I also tried, as suggested, "double coding." Unfortunately, this did not have the desired effect, since &amp; not decoded by the parser. Here is the code I used:

 string schemaText = @"...<pattern value=""[A-Z0-9&#x26;#x20;]"" />..."; XmlWriterSettings writerSettings = new XmlWriterSettings(); writerSettings.Indent = true; writerSettings.NewLineChars = Environment.NewLine; writerSettings.Encoding = Encoding.Unicode; writerSettings.CloseOutput = true; writerSettings.OmitXmlDeclaration = false; writerSettings.IndentChars = "\t"; StringBuilder writtenSchema = new StringBuilder(); using ( StringReader sr = new StringReader( schemaText ) ) using ( XmlReader reader = XmlReader.Create( sr ) ) using ( TextWriter tr = new StringWriter( writtenSchema ) ) using ( XmlWriter writer = XmlWriter.Create( tr, writerSettings ) ) { XPathDocument doc = new XPathDocument( reader ); XPathNavigator nav = doc.CreateNavigator(); nav.WriteSubtree( writer ); } 

Written XML ends with:

 <pattern value="[A-Z0-9&amp;#x20;]" /> 
+4
source share
2 answers

If you want it to be saved, you need to encode it twice: &amp;#x20; . An XML reader will translate entities that more or less work with XML.

+2
source
 <pattern value="[A-Z0-9&#x26;#x20;]" /> 

What I did above is replaced "&" with "& # x26;" thereby escaping the ampersand.

+1
source

Source: https://habr.com/ru/post/1308756/


All Articles