I project some data as XML from SQL Server using ADO.NET. Some of my data contains invalid characters in XML, such as CHAR(7)(known as BEL).
SELECT 'This is BEL: ' + CHAR(7) AS A FOR XML RAW
SQL Server encodes invalid characters such as numeric references:
<row A="This is BEL: " />
However, even the encoded form is invalid in XML 1.0 and will lead to errors in XML parsing:
var doc = XDocument.Parse("<row A=\"This is BEL: \" />");
I would like to replace all of these invalid numeric references to the Unicode replacement character ' '. I know how to do this for unencoded XML:
string str = "<row A=\"This is BEL: \u0007\" />";
if (str.Any(c => !XmlConvert.IsXmlChar(c)))
str = new string(str.Select(c => XmlConvert.IsXmlChar(c) ? c : ' ').ToArray());
XML? HtmlDecode, HtmlEncode , , .
. #, SQL, .