java.nio.charset.Charset has a newDecoder() method that returns a Decoder . Deconder has isAutoDetecting() , isChasetDetected() and detectedCharset() methods that seem useful for your task. Unfortunately, all of these methods are optional.
I think you should take all the available Charsets ( Charset.availableCharsets() ) and check first if they are auto-detectable. Thus, when you receive a new stream, first try using the built-in auto-detection mechanism for those encodings that implement these additional operations.
If none of these decoders can detect the mechanism, you should try to decode the stream (as you explained), trying to apply other encodings. To optimize the process, try sorting the encodings using the following criteria.
National alphabets first. For example, try Cyrillic encodings before it comes to Latin alphabets.
Among the national alphabets, take one that has more characters. For example, Japanese and Chinese will be at the beginning of the line.
The reason for this strategy is that you want to crash as quickly as possible. If your text does not contain Japanese characters, you should check the first character from your stream to understand that it is not Japanese. But if you try to use ASCII encoding to decode French text, you will probably have to read a lot of characters before you see the first è .
source share