In my opinion, the testing program is deeply flawed because it makes useless conversions between lines without semantic meaning.
If you want to check if all byte values ββare valid values ββfor a given encoding, then something like this might be more like:
public static void tryEncoding(final String encoding) throws UnsupportedEncodingException { int badCount = 0; for (int i = 1; i < 255; i++) { byte[] bytes = new byte[] { (byte) i }; String toString = new String(bytes, encoding); byte[] fromString = toString.getBytes(encoding); if (!Arrays.equals(bytes, fromString)) { System.out.println("Can't encode: " + i + " - in: " + Arrays.toString(bytes) + "/ out: " + Arrays.toString(fromString) + " - result: " + toString); badCount++; } } System.out.println("Bad count: " + badCount); }
Please note that this test program checks input using (usnigned) byte values from 1 to 255. The code in the question uses char values ββ(equivalent to Unicode code points in this range) from 1 to 255.
Try printing the actual byte arrays processed by the program in the example, and you see that you are not actually checking all byte values ββand that some of your βbadβ matches are duplicates of others.
Running this using "Windows-1252" as the argument produces this output:
Can't encode: 129 - in: [-127] / out: [63] - result:
Can't encode: 141 - in: [-115] / out: [63] - result:
Can't encode: 143 - in: [-113] / out: [63] - result:
Can't encode: 144 - in: [-112] / out: [63] - result:
Can't encode: 157 - in: [-99] / out: [63] - result:
Bad count: 5
Which tells us that Windows-1252 does not accept byte values ββ129, 1441, 143, 144, and 157 as valid values. (Note: I'm talking about unsigned values ββhere. The above code shows -127, -115, ... because Java only knows unrecognized bytes).
The Wikipedia article on Windows-1252 seems to confirm this observation by stating the following:
According to the Microsoft Consortium and Unicode website information, 81, 8D, 8F, 90, and 9D are not used