Replacing Unicode characters in a string in C #

I have a line like:

string str = "ĄĆŹ - ćwrą"; 

How can I replace ĄĆŹ - ćą shortcuts? The result for this example line should be:

 str = "\u0104\u0106\u0179 \u2013 \u0107wr\u0105" 

Is there any quick change method? I do not want to use .Replace for each character ...

+4
source share
1 answer

Converting to a JSON string as this is more cumbersome than it should be, mainly because you need to work with Unicode code points, which in practice means calling char.ConvertToUtf32 . To do this, you need to somehow handle surrogate pairs; System.Globalization.StringInfo can help with this.

Here is the function that uses these building blocks to perform the conversion:

 string str = "ĄĆŹ - ćwrą"; public string ToJsonString(string s) { var enumerator = StringInfo.GetTextElementEnumerator(s); var sb = new StringBuilder(); while (enumerator.MoveNext()) { var unicodeChar = enumerator.GetTextElement(); var codePoint = char.ConvertToUtf32(unicodeChar, 0); if (codePoint < 0x80) { sb.Append(unicodeChar); } else if (codePoint < 0xffff) { sb.Append("\\u").Append(codePoint.ToString("x4")); } else { sb.Append("\\u").Append((codePoint & 0xffff).ToString("x4")); sb.Append("\\u").Append(((codePoint >> 16) & 0xffff).ToString("x4")); } } return sb.ToString(); } 
+4
source

Source: https://habr.com/ru/post/1487715/


All Articles