Why is US-ASCII encoding not accepting US-ASCII characters?

Question

Why is US-ASCII encoding not accepting US-ASCII characters?

Consider the following code:

public class ReadingTest {

    public void readAndPrint(String usingEncoding) throws Exception {
        ByteArrayInputStream bais = new ByteArrayInputStream(new byte[]{(byte) 0xC2, (byte) 0xB5}); // 'micro' sign UTF-8 representation
        InputStreamReader isr = new InputStreamReader(bais, usingEncoding);
        char[] cbuf = new char[2];
        isr.read(cbuf);
        System.out.println(cbuf[0]+" "+(int) cbuf[0]);
    }

    public static void main(String[] argv) throws Exception {
        ReadingTest w = new ReadingTest();
        w.readAndPrint("UTF-8");
        w.readAndPrint("US-ASCII");
    }
}

Observed Conclusion:

µ 181
? 65533

Why is the second call made readAndPrint()(the one that uses US-ASCII)? I would expect it to throw an error, since the input is not the correct character in this encoding. What is the place in the Java API or JLS that defines this behavior?

+3

java encoding utf-8 ascii non-ascii-characters

Grzegorz oledzki Feb 03 '11 at 13:01

source share

2 answers

, , String(byte bytes[], int offset, int length, Charset charset):

. java.nio.charset.CharsetDecoder , .

CharsetDecoder, CodingErrorAction.

+3

maaartinus 03 . '11 13:09

Joachim sauer · Accepted Answer · 2011-02-03T13:08:37+0000

The default operation when searching for non-decoded bytes in the input stream is to replace them with the Unicode character U + FFFD REPLACEMENT CHARACTER .

If you want to change this, you can pass to , in which another is configured :CharacterDecoder InputStreamReaderCodingErrorAction

CharsetDecoder decoder = Charset.forName(usingEncoding).newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
InputStreamReader isr = new InputStreamReader(bais, decoder);

Why is US-ASCII encoding not accepting US-ASCII characters?

More articles: