I have UTF-8 text in a file utf8.txt. The file contains some characters that are outside the ASCII range. I tried the following code:
var fname = "utf8.txt";
var enc = Encoding.GetEncoding("ISO-8859-1", EncoderFallback.ExceptionFallback,
DecoderFallback.ExceptionFallback);
var s = System.IO.File.ReadAllText(fname, enc);
The expected behavior is that the code should throw an exception because it is not valid for the text of ISO-8859-1. Instead, the behavior is that it correctly decodes the text of UTF-8 into the desired characters (it looks correct in the debugger).
Is this a bug in .Net?
EDIT:
The file I tested with was UTF-8 with a specification. If I remove the spec, the behavior will change. It still does not throw exceptions, however it creates the wrong Unicode string (the string doesn't look correct in the debugger).
EDIT:
To create a test file, run the following code:
var fname = "utf8.txt";
var utf8_bom_e_circumflex_bytes = new byte[] {0xEF, 0xBB, 0xBF, 0xC3, 0xAA};
System.IO.File.WriteAllBytes(fname, utf8_bom_e_circumflex_bytes);
EDIT:
I think I have a solid handle in what happens (although I disagree with part of the .Net behavior).
UTF-8, UTF-8, ReadAllText , , () UTF-8. ( , , , UTF-8). . , .Net , , .
,.Net ( 100% - ) , ISO-8859-1, (?) UTF-8 ISO-8859-1, . , . ( )