You simply iterate over the contents and use the Character functions to verify it. I use real code points, so it supports extra Unicode characters.
When working with code points, the index cannot simply increase by one, as some code points actually read two characters (aka code units). This is why I use while and Character.charCount(int cp) .
static void countCharacterClasses(String input) { int upper = 0; int lower = 0; int other = 0;
For this example, the result will be:
// test with plain letters, numbers and international chars: countCharacterClasses("AABBรรคoรabc0\uD801\uDC00"); // U+10400 "DESERET CAPITAL LETTER LONG I" is 2 char UTF16: D801 DC00 Input has 6 upper, 6 lower and 1 other codepoints
It counts the German font as lowercase (no uppercase option) and special padding code (which is two code units / char long) as uppercase. The number will be considered "different."
Using Character.getType(int cp) instead of Character.isUpperCase() has the advantage that it only needs to look at the code point once for several (all) character classes. It can also be used to count all different classes (letters, spaces, controls, and all other unicode classes (TITLECASE_LETTER, etc.).
For a good background, read why you need to care about code points and units, check out: http://www.joelonsoftware.com/articles/Unicode.html
source share