As we know, Unicode was invented to solve the encoding problem and to represent all the characters of all (well, not all, but most) languages of the world. Then we have the Unicode conversion formats - how to represent the Unicode character in computer bytes:
- utf-8 one character can take from 1 to 4 bytes
- utf-16 one character takes 2 bytes or 2 * 2 bytes = 4 bytes (.NET uses this)
- utf-32 one character always takes 4 bytes (I heard Python uses this)
Good bye. Next we take, for example, two languages:
English in the United Kingdom (en-GB) and Slovenian in Slovenia (sl-SI). English has the following characters: a, b, c, d, e, ... x, y, z. Slovenian has the same characters except x, y and has additional characters: č, š, ž. If I run the code below:
Thread.CurrentThread.CurrentCulture = new CultureInfo("sl-SI");
string upperCase = "č".ToUpper();
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
string upperCase1 = "č".ToUpperInvariant();
We can take an example for Turkey : The lowercase "i" becomes "İ" (U + 0130 "Latin I With Dot Above Written Letter") when it moves to uppercase. Similarly, our uppercase “I” becomes “ı” (U + 0131 “Latin small letter Dotless I”) when it moves to lowercase.


, ToUpperInvariant() "i" "İ", "I". . , ? , , , , . , , \u + 0000 \u + FFFF, .