Illegal character in xml document

I have a program that generates Xml files from data from a database. In short code, he does the following:

string dsn = "a db connection string"; XmlDocument d = new XmlDocument(); using (SqlConnection con = new SqlConnection(dsn)) { con.Open(); string sql = "select id as Id, comment as Comment from Test where ... "; using (SqlCommand cmd = new SqlCommand(sql, con)) { DataSet ds = new DataSet("EXPORT"); SqlDataAdapter da = new SqlDataAdapter(cmd); da.Fill(ds, "Test"); d.LoadXml(ds.GetXml()); } } d.Save(@"c:\test.xml"); 

When I look at the xml file, it contains an invalid character and # x 1 A;

 <EXPORT> <Test> <Id>2</Id> <Comment> Keyboard NB&#x1A;5 linked</Comment> </Test> </EXPORT> 

This xml file cannot be opened by firefox browser with invalid character ...

This object is reserved in ISO 8859-1 and CP1252 and should not be displayed by browsers. But why does the XmlDocument output xml that cannot be parsed as valid - or is it a valid XML document that simply cannot be parsed by browsers or imported by Excel and so on ... Is there an easy way to get rid of these reserved โ€œinvalid charactersโ€ or to encode them in such a way that browsers have no problem with it?

Thanks so much for your feedback and tips.

+5
source share
6 answers

Not all characters are represented in XML.

In XML 1.0, none of the characters with values โ€‹โ€‹less than 0x20 can be used except for TAB (0x09), LF (0x0A), and CR (0x0D).

In XML 1.1, you can use almost everything except NUL (0x00).

If you have the ability to use XML 1.1, and the receiving program supports XML 1.1 (that's not much), you can avoid 0x1A like &#26; or &#x1A; .

A wrapper in CDATA also not a solution; CDATA is simply a convenience for escaping groups of characters other than standard & -mechanism.

Otherwise, you will need to remove it before serialization.

+3
source

I came across this several times when creating / processing XML from SQL data.

But why does the XmlDocument output xml that cannot be parsed as valid - or is it a valid XML document that simply cannot be parsed by browsers or imported by Excel, etc.

XmlDocument does not perform any check on the data that you send, it leaves it to you (the developer). This XML document should be invalid in almost everyone that uses XML (but I could be wrong about that ... you can always check it: P)

Almost every time I came across this problem, I ended up using the replacement of offensive XML data with either the appropriate character (if any), or just got rid of it.

You can also try placing your xml inside the CData block, but this inflates the file with a tiny bit (not sure how big your file will be)

+1
source

Take a look at this xml parsing error on illegal character

Conclusion (as I understand it): With XML 1.0, it is not possible to save this value.

+1
source

Take a look at this answer to see if it helps:

.NET DataSet.GetXml () - what is the default encoding?

0
source

I would think that you are processing a Control-Z character (end of text file). Is it possible?

0
source

Make sure you delete the XML objects, for example. & => &amp; Otherwise, wrap the data in CDATA http://en.wikipedia.org/wiki/CDATA

-1
source

Source: https://habr.com/ru/post/912214/


All Articles