How to convert from unicode to ASCII

Is there a way to convert Unicode values ​​to ASCII?

+4
source share
5 answers

To simply remove accents from unicode characters, you can use something like:

string.Concat(input.Normalize(NormalizationForm.FormD).Where( c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)); 
+6
source

Technically, yes, you can use Encoding.ASCII .

Example (from byte [] to ASCII):

 // Convert Unicode to Bytes byte[] uni = Encoding.Unicode.GetBytes("Whatever unicode string you have"); // Convert to ASCII string Ascii = Encoding.ASCII.GetString(uni); 

Just remember Unicode is a much higher standard than Ascii, and there will be characters that simply cannot be correctly encoded. Look here for tables and a bit more information about these two encodings.

+3
source

This workaround may better suit your needs. It removes unicode characters from the string and saves only ASCII characters.

 byte[] bytes = Encoding.ASCII.GetBytes("eéêëèiïaâäàåcç  test"); char[] chars = Encoding.ASCII.GetChars(bytes); string line = new String(chars); line = line.Replace("?", ""); //Results in "eiac test" 

Note that the second "space" in the character input line is char with an ASCII value of 255

+3
source

Well, considering that around 100,000 Unicode characters and only 128 ASCII characters, a 1-1 mapping is obviously not possible.

You can use the Encoding.ASCII object to get ASCII byte values ​​from a Unicode string.

+1
source

CANNOT convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same code points in ASCII as in UTF-8, which is probably what you have. Almost the only thing you can do is even close to the right one is to drop all characters above codepoint 128, and even this most likely does not come close to your requirements. (Another possibility is to simplify accented or omitted letters to make more than 128 characters “almost” expressive, but which still don't even begin to really encompass Unicode.)

+1
source

Source: https://habr.com/ru/post/1304337/


All Articles