System.IO.File.ReadAllText does not throw an exception for invalid encoding

I have UTF-8 text in a file utf8.txt. The file contains some characters that are outside the ASCII range. I tried the following code:

var fname = "utf8.txt";
var enc = Encoding.GetEncoding("ISO-8859-1", EncoderFallback.ExceptionFallback,
    DecoderFallback.ExceptionFallback);
var s = System.IO.File.ReadAllText(fname, enc);

The expected behavior is that the code should throw an exception because it is not valid for the text of ISO-8859-1. Instead, the behavior is that it correctly decodes the text of UTF-8 into the desired characters (it looks correct in the debugger).

Is this a bug in .Net?

EDIT:

The file I tested with was UTF-8 with a specification. If I remove the spec, the behavior will change. It still does not throw exceptions, however it creates the wrong Unicode string (the string doesn't look correct in the debugger).

EDIT:

To create a test file, run the following code:

var fname = "utf8.txt";
var utf8_bom_e_circumflex_bytes = new byte[] {0xEF, 0xBB, 0xBF, 0xC3, 0xAA};
System.IO.File.WriteAllBytes(fname, utf8_bom_e_circumflex_bytes);

EDIT:

I think I have a solid handle in what happens (although I disagree with part of the .Net behavior).

  • UTF-8, UTF-8, ReadAllText , , () UTF-8. ( , , , UTF-8). . , .Net , , .

  • ,.Net ( 100% - ) , ISO-8859-1, (?) UTF-8 ISO-8859-1, . , . ( )

+3
1

, . ISO-8859-1

ISO-8859-1 , , ISO-8859-1, ISO-8859-1.

(, 0x80-0x9F , , , . ISO-8859, C1 0x80-0x9F, . , , , Windows-1252.)

UTF-8, UTF-8, ReadAllText , , () UTF-8.

. :

This method attempts to automatically detect the encoding of a file based on the presence of byte order marks.

, . ReadAllBytes Encoding.GetString .

+1

Source: https://habr.com/ru/post/1718925/


All Articles