C # - Comparing strings of different encodings

Question

C # - Comparing strings of different encodings

Using C #, I select the TextBox.Text value on the .ascx page. When I compare the equality of a value with a regular string object inside a LINQ query, it always returns false.

I came to the conclusion that they are encoded differently, but still no luck in converting or comparing them.

 docname = "Testdoc 1.docx"; //regular string created in C# fetchedVal = ((TextBox)e.Item.FindControl("txtSelectedDocs")).Text; //UTF-8

The above two lines are identical if they are represented as literals, but when comparing byte[] they obviously differ due to coding.

I tried many different things, for example:

 System.Text.Encoding.Default.GetString(utf8.GetBytes(fetchedVal));

but this will return the value "TestdocÂ 1.docx" .

If I try instead

 System.Text.Encoding.Default.GetString(System.Text.Encoding.Default.GetBytes(fetchedVal));

it returns "Testdoc 1.docx" , but Equals() -check still returns false .

I also tried the following, which seems to be recommended, but with no luck:

 byte[] utf8Bytes = Encoding.UTF8.GetBytes(fetchedVal); byte[] unicodeBytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, utf8Bytes); string fetchedValConverted = Encoding.Unicode.GetString(unicodeBytes);

The culprit is apparently a space, because when learning a sequence of bytes, the seventh byte always matters.

How do you correctly convert from UTF-8 to standard lowercase encoding in C #?

+6

string c # encoding

Daniel B Sep 29 '14 at 15:29

source share

1 answer

SLaks · Accepted Answer · 2014-09-29T15:33:26+0000

Strings do not have encodings or byte arrays. Encodings are included only when converting a string to an array of bytes; you can do this by specifying which encoding to use to select bytes.

It looks like you actually just have different characters in your lines. You may have an invisible character in one of them, or they may have different characters that look the same.

To find out, look at the Unicode code values of each character in each line (for example, (int) str[0] ).

C # - Comparing strings of different encodings

More articles: