How does String.ToLowerInvariant () determine which string / character it should convert?

Question

How does String.ToLowerInvariant () determine which string / character it should convert?

As we know, Unicode was invented to solve the encoding problem and to represent all the characters of all (well, not all, but most) languages of the world. Then we have the Unicode conversion formats - how to represent the Unicode character in computer bytes:

utf-8 one character can take from 1 to 4 bytes
utf-16 one character takes 2 bytes or 2 * 2 bytes = 4 bytes (.NET uses this)
utf-32 one character always takes 4 bytes (I heard Python uses this)

Good bye. Next we take, for example, two languages:

English in the United Kingdom (en-GB) and Slovenian in Slovenia (sl-SI). English has the following characters: a, b, c, d, e, ... x, y, z. Slovenian has the same characters except x, y and has additional characters: č, š, ž. If I run the code below:

Thread.CurrentThread.CurrentCulture = new CultureInfo("sl-SI");
string upperCase = "č".ToUpper(); // returns Č, which is correct based on sl-SI culture

// returns Č, how does it know that it must convert č to Č. 
// What if some other language has character č, and č in that language converts to X.
// How does it determine to what character it must convert?
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
string upperCase1 = "č".ToUpperInvariant();

We can take an example for Turkey : The lowercase "i" becomes "İ" (U + 0130 "Latin I With Dot Above Written Letter") when it moves to uppercase. Similarly, our uppercase “I” becomes “ı” (U + 0131 “Latin small letter Dotless I”) when it moves to lowercase.

, ToUpperInvariant() "i" "İ", "I". . , ? , , , , . , , \u + 0000 \u + FFFF, .

+4

c# unicode

broadband 06 . '15 9:10

4

- , , "" .

?

, . , .

ß, .

:

var germanCulture = new CultureInfo("de-DE");

System.Threading.Thread.CurrentThread.CurrentCulture   = germanCulture;
System.Threading.Thread.CurrentThread.CurrentUICulture = germanCulture;

string s = "ß";

Console.WriteLine(s.ToUpper()); // Prints ß
Console.WriteLine(s.ToLower()); // Prints ß

// Aside: There a special "uppercase" ß, but this isn't
// returned from "ß".ToUpper();

string t = "ẞ"; // Special "uppercase" ß.

Console.WriteLine(t == s); // Prints false.

Console.WriteLine(s.ToUpper() == t); // Prints false.

(. ß (ẞ), "ß".ToUpper().)

+2

Matthew Watson 06 . '15 9:17

MSDN:

, . , .

, , , , ..

this

ToLower ToLowerInvariant. , . , Windows ,

+1

Rahul Tripathi 06 . '15 9:17

: -

; , - /.
, , - .

Example: we show the date dd / MM / YYYY in IST, but in EST this can happen through some exception or get a different value. Therefore, in order to get out of similar problems, we can use the invariant concept.

0

Chetan sharma Oct 6 '15 at 9:27

source share

nwellnhof · Accepted Answer · 2015-10-06T12:34:16+0000

Unicode :

UnicodeData.txt: case, . - .
SpecialCasing.txt: , , "ß" - "SS". - , , , .

UnicodeData.txt :

0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
010C;LATIN CAPITAL LETTER C WITH CARON;Lu;0;L;0043 030C;;;;N;LATIN CAPITAL LETTER C HACEK;;;010D;
010D;LATIN SMALL LETTER C WITH CARON;Ll;0;L;0063 030C;;;;N;LATIN SMALL LETTER C HACEK;;010C;;010C

( , .)

, , , Unicode , :

uppercase(i) = I
uppercase(č) = Č
lowercase(Č) = č

SpecialCasing.txt :

:
<code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment>

, .

:

# When uppercasing, i turns into a dotted capital I

0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I

, ( ) :

uppercase(i) = İ

. , .NET-.

How does String.ToLowerInvariant () determine which string / character it should convert?

More articles: