I think I found the problem. By default, XmlSerializer will allow you to generate invalid XML.
Based on the code:
var input = "\u001a"; var writer = new StringWriter(); var serializer = new XmlSerializer(typeof(string)); serializer.Serialize(writer, input); Console.WriteLine(writer.ToString());
Conclusion:
<?xml version="1.0" encoding="utf-16"?> <string></string>
This is invalid XML. According to the XML specification, all character references must be valid. Valid characters are:
#x9 |
As you can see, U + 001A (and all other C0 / C1 control characters) are not allowed as references, as they are not allowed.
The error message given by the decoder is a bit misleading, and it would be clearer if he said that there is an invalid symbolic link.
There are several options for what you can do.
1) Do not let XmlSerializer create invalid documents in the first place
You can use XmlWriter , which by default will not allow invalid characters:
var input = "\u001a"; var writer = new StringWriter(); var serializer = new XmlSerializer(typeof(string));
This will throw an exception when serialization happens. This will need to be processed and the corresponding error displayed.
This is probably not useful for you because you already have data with these invalid characters.
or 2) Remove references to this invalid character
That is, instead of .Replace((char)0x1a, ' ') , which does not actually replace anything in your document, use .Replace("", " ") . (This is not case sensitive, but this is what .NET generates. A more robust solution would be to use a case-insensitive regular expression.)
On the XML side, it actually allows you to reference control characters if they are links, not just characters in the document. This will solve your problem, except that .NET XmlSerializer does not support version 1.1.