Why is the same character compared twice, changing its case to UP and then dropping?

Question

Why is the same character compared twice, changing its case to UP and then dropping?

Below is the code in the String class in java. I do not understand why characters from two different lines are compared twice. First, make uppercase, and if that fails, do lowercase letters.

My question is, is it needed? If so, why?

public static final Comparator<String> CASE_INSENSITIVE_ORDER = new CaseInsensitiveComparator(); private static class CaseInsensitiveComparator implements Comparator<String>, java.io.Serializable { // use serialVersionUID from JDK 1.2.2 for interoperability private static final long serialVersionUID = 8575799808933029326L; public int compare(String s1, String s2) { int n1 = s1.length(); int n2 = s2.length(); int min = Math.min(n1, n2); for (int i = 0; i < min; i++) { char c1 = s1.charAt(i); char c2 = s2.charAt(i); if (c1 != c2) { c1 = Character.toUpperCase(c1); c2 = Character.toUpperCase(c2); if (c1 != c2) { c1 = Character.toLowerCase(c1); c2 = Character.toLowerCase(c2); if (c1 != c2) { // No overflow because of numeric promotion return c1 - c2; } } } } return n1 - n2; } }

+5

java string comparator unicode

Tushar banne Jan 05 '16 at 14:09

source share

1 answer

Jan · Accepted Answer · 2016-01-05T14:13:27+0000

The problem may be more complex.

There are characters in which there are several lower case codes for the same upper case or vice versa. Thus, in order to verify case-sensitive compliance, you need to compare both the top and bottom versions if one of them matches.

One example is

The Greek upper letter "Σ" has two different lowercase forms: "ς" in the word-final position and "σ" in another place.

Source: Wikipedia

Uppercase is not the same, but lowercase, so VGR provided this great example:

A better example would be '\ u0130' (İ) and 'I'. Passing them through toUpperCase leaves them unchanged (and therefore different), but passing them through toLowerCase results in identical character values

Why is the same character compared twice, changing its case to UP and then dropping?

More articles: