Java (internally) always encodes a String in UTF-16, regardless of its contents. http://docs.oracle.com/javase/6/docs/api/java/lang/String.html
You can convert it to any supported encoding, including ASCII and UTF-8, but you may lose characters that are not displayed in the selected encoding.
Depending on why you are checking, you can convert the string to ASCII and read it back to java string and see if they match. If so, ASCII is enough to store your string. This would be the most obvious check for later readers of your source code.
You can also compare the unicode code point of each character with 128, if they are all <= 127, the string is ASCII compatible, i.e. does not contain arabica. To get the unicode encoding code for the character of your string, use str.codePointAt(index) .
If you explicitly want to find Arabic text, you must explicitly check for Arabic characters. Otherwise, you may receive false positives for French, German, or many other languages ββthat use accented characters. Fortunately, the Unicode consortium binds blocks in one language, so the check probably comes down to cp >= beginningOfUnicodeBlock && cp <= endOfUnicodeBlock .
Change scheduled by tchrist: there is java.lang.Character.UnicodeBlock and java.lang.Character.UnicodeScript . The latter was added in Java 7. Both of them can be used to classify Unicode codes.
int cp = str.codePointAt(index); if (UnicodeScript.ARABIC.equals(UnicodeScript.of(cp)) {
source share