Why is the default encoding in XmlReader different from the default XmlTextReader?

I have an XML file that is ACII encoded.

I tried reading it using two different versions of Microsoft XmlReader:

  • XmlReader.Create(new StreamReader(fileImport.FileContent, true));
  • new XmlTextReader(fileImport.FileContent)

The first, XmlReader.Create, which uses StreamReader to encode it, works great.

The second, the new XmlTextReader, throws an XmlException with the message "Invalid character in this encoding."

If you read the MSDN documentation for both of them, they both must determine the encoding from byte bytes, and if that doesn't work, go back to UTF-8.

XmlTextReader [msdn] XmlTextReader.Encoding property

StreamReader [msdn] StreamReader constructor

So, why does XmlTextReader not work with invalid encoding, while StreamReader does not do this, when the documentation says both implementations, does it treat encoding the same way by default?

+4
source share
1 answer

They work the same way, but you do not use them the same way. In the first case, you pass StreamReader as a parameter, and in the second, you pass the location of the file.

When you create an XmlReader on a TextReader (e.g. StreamReader ), it always uses the TextReader encoding (ignoring the value of the encoding attribute in the XML declaration). When you simply pass a path or stream, it uses the encoding attribute in the XML declaration.

In your case, I suspect that the declared encoding does not match the actual encoding of the file. I was able to reproduce your problem by creating an XML file that declares its encoding as UTF-8, but is actually encoded as ANSI. If the file contains non-ASCII characters, I get the same error. But if I fix the encoding in the XML declaration, it works fine ...

+5
source

Source: https://habr.com/ru/post/1433062/


All Articles