Why is US-ASCII encoding not accepting US-ASCII characters?

Consider the following code:

public class ReadingTest {

    public void readAndPrint(String usingEncoding) throws Exception {
        ByteArrayInputStream bais = new ByteArrayInputStream(new byte[]{(byte) 0xC2, (byte) 0xB5}); // 'micro' sign UTF-8 representation
        InputStreamReader isr = new InputStreamReader(bais, usingEncoding);
        char[] cbuf = new char[2];
        isr.read(cbuf);
        System.out.println(cbuf[0]+" "+(int) cbuf[0]);
    }

    public static void main(String[] argv) throws Exception {
        ReadingTest w = new ReadingTest();
        w.readAndPrint("UTF-8");
        w.readAndPrint("US-ASCII");
    }
}

Observed Conclusion:

µ 181
? 65533

Why is the second call made readAndPrint()(the one that uses US-ASCII)? I would expect it to throw an error, since the input is not the correct character in this encoding. What is the place in the Java API or JLS that defines this behavior?

+3
source share
2 answers

The default operation when searching for non-decoded bytes in the input stream is to replace them with the Unicode character U + FFFD REPLACEMENT CHARACTER .

If you want to change this, you can pass to , in which another is configured :CharacterDecoder InputStreamReaderCodingErrorAction

CharsetDecoder decoder = Charset.forName(usingEncoding).newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
InputStreamReader isr = new InputStreamReader(bais, decoder);
+9

, , String(byte bytes[], int offset, int length, Charset charset):

. java.nio.charset.CharsetDecoder , .

CharsetDecoder, CodingErrorAction.

+3

Source: https://habr.com/ru/post/1789746/


All Articles