How does String.ToLowerInvariant () determine which string / character it should convert?

As we know, Unicode was invented to solve the encoding problem and to represent all the characters of all (well, not all, but most) languages ​​of the world. Then we have the Unicode conversion formats - how to represent the Unicode character in computer bytes:

  • utf-8 one character can take from 1 to 4 bytes
  • utf-16 one character takes 2 bytes or 2 * 2 bytes = 4 bytes (.NET uses this)
  • utf-32 one character always takes 4 bytes (I heard Python uses this)

Good bye. Next we take, for example, two languages:

English in the United Kingdom (en-GB) and Slovenian in Slovenia (sl-SI). English has the following characters: a, b, c, d, e, ... x, y, z. Slovenian has the same characters except x, y and has additional characters: č, š, ž. If I run the code below:

Thread.CurrentThread.CurrentCulture = new CultureInfo("sl-SI");
string upperCase = "č".ToUpper(); // returns Č, which is correct based on sl-SI culture

// returns Č, how does it know that it must convert č to Č. 
// What if some other language has character č, and č in that language converts to X.
// How does it determine to what character it must convert?
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
string upperCase1 = "č".ToUpperInvariant();

We can take an example for Turkey : The lowercase "i" becomes "İ" (U + 0130 "Latin I With Dot Above Written Letter") when it moves to uppercase. Similarly, our uppercase “I” becomes “ı” (U + 0131 “Latin small letter Dotless I”) when it moves to lowercase.

to upper

lower

, ToUpperInvariant() "i" "İ", "I". . , ? , , , , . , , \u + 0000 \u + FFFF, .

+4
4

Unicode :

UnicodeData.txt: case, . - .

SpecialCasing.txt: , , "ß" - "SS". - , , , .

UnicodeData.txt :

0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
010C;LATIN CAPITAL LETTER C WITH CARON;Lu;0;L;0043 030C;;;;N;LATIN CAPITAL LETTER C HACEK;;;010D;
010D;LATIN SMALL LETTER C WITH CARON;Ll;0;L;0063 030C;;;;N;LATIN SMALL LETTER C HACEK;;010C;;010C

( , .)

, , , Unicode , :

uppercase(i) = I
uppercase(č) = Č
lowercase(Č) = č

SpecialCasing.txt :

:

<code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment>

, .

:

# When uppercasing, i turns into a dotted capital I

0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I

, ( ) :

uppercase(i) = İ

. , .NET-.

+1

- , , "" .

?

, . , .

ß, .

:

var germanCulture = new CultureInfo("de-DE");

System.Threading.Thread.CurrentThread.CurrentCulture   = germanCulture;
System.Threading.Thread.CurrentThread.CurrentUICulture = germanCulture;

string s = "ß";

Console.WriteLine(s.ToUpper()); // Prints ß
Console.WriteLine(s.ToLower()); // Prints ß

// Aside: There a special "uppercase" ß, but this isn't
// returned from "ß".ToUpper();

string t = "ẞ"; // Special "uppercase" ß.

Console.WriteLine(t == s); // Prints false.

Console.WriteLine(s.ToUpper() == t); // Prints false.

(. ß (), "ß".ToUpper().)

+2

MSDN:

, . , .

, , , , ..

this

ToLower ToLowerInvariant. , . , Windows ,

+1

: -

  • ; , - /.

  • , , - .

Example: we show the date dd / MM / YYYY in IST, but in EST this can happen through some exception or get a different value. Therefore, in order to get out of similar problems, we can use the invariant concept.

0
source

Source: https://habr.com/ru/post/1610392/


All Articles