Paste from Word + Create XML document & # 8594; the hexadecimal value 0x0C is an invalid character (.Net)

I have a webpage that accepts HTML input from users. The input is converted to an XML document using a namespace System.Xml, for example:

var doc = new XmlDocument();
doc.AppendChild(doc.CreateElement("root"));
doc.DocumentElement.SetAttribute("BodyHTML", theTextBox.Text);

Subsequently, the Xsl ( System.Xml.Xsl.XslCompiledTransform) transform is used for the data .

Users tend to write text in Microsoft Word using bullets, quotation marks, etc. When pasted onto my page, their text includes invalid characters such as 0x0C, 0x03, etc. When using the xsl conversion, this error occurs "the hexadecimal value 0x0C is an invalid character".

My fix so far has been to destroy characters that I think are offensive, using loops and String.Replace: All characters from 0 to 31, except 9, 10 and 13 are replaced by String.Empty.

What I'm looking for is the best way to do this. Built-in .Net method? Or maybe just a complete list of illegal characters in Unicode.

+3
source share
1 answer

Found two answers that do the same

StringBuilder, . Regex .Replace . Xml, , .

(1,8 1000 ), ( "Hello world" 10 000 000 ). StringBuilder ~ 3 , . , , , , .

:

CleanInvalidXmlChars time: 00:00:07.4356230
SanitizeXmlString    time: 00:00:02.3703305

:

CleanInvalidXmlChars time: 00:00:05.2805834
SanitizeXmlString    time: 00:00:01.8319114
+9

Source: https://habr.com/ru/post/1746386/


All Articles