We have various XML files created by the application in the current distribution. Some of these files turned out to contain invalid characters, which makes them invalid XML, which will not load in most cases, if all checks are not disabled, and then only in instances XmlDocument, and not XDocument.
Since this application already exists, we must deal with the files that it creates. Now I could add to the type Sanitizerthat knows what to look for and how to fix it before trying to load the document, but I was hoping that someone might already have made an effort to create something that might have done it already in effective mode (e.g. class SanitizedXmlReader).
This question touches the same topic, but I did not find a satisfactory answer there. All we want to do is delete invalid content anywhere in the XML file (and not data that is valid, such as in CDATA or when it is not used in QName).
So, is there such a thing that can take an “almost” XML file and turn it into an “at least invalid characters” XML file? If not, then our new version is the next option. In this case, instead of spending time interpreting the XML specification to determine which characters are illegal in all situations, is there a specific list?
source
share