Remove all hexadecimal characters before loading a string into an XML document object?

I have an xml string that is sent to the ashx handler on the server. The xml string is built on the client side and is based on several different elements made in the form. Sometimes, some users copy and paste from other sources into a web form. When I try to load an xml string into an XMLDocument object using xmldoc.LoadXml(xmlStr) , I get the following exception:

 System.Xml.XmlException = {"'', hexadecimal value 0x0B, is an invalid character. Line 2, position 1."} 

In debug mode, I see a rogue character (sorry, I'm not sure about the official name?):

My questions are: how can I clear the xml string before I try to load it into an XMLDocument object? Do I need a special function to parse all of these character types, one or one, or can I use my own .NET4 class to remove them?

Rogue character in debug mode

+9
source share
2 answers

Here you have an example for cleaning invalid xml characters with Regex :

  xmlString = CleanInvalidXmlChars(xmlString); XmlDocument xmlDoc = new XmlDocument(); xmlDoc.LoadXml(xmlString); public static string CleanInvalidXmlChars(string text) { string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]"; return Regex.Replace(text, re, ""); } 
+23
source

A more efficient way to eliminate errors on invalid XML characters is to use the CheckCharacters flag in XmlReaderSettings.

 var xmlDoc = new XmlDocument(); var xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false }; using (var stringReader = new StringReader(xml)) { using (var xmlReader = XmlReader.Create(stringReader, xmlReaderSettings)) { xmlDoc.Load(xmlReader); } } 
+2
source

Source: https://habr.com/ru/post/1204641/


All Articles