Replace unicode escape sequences in string

Question

Replace unicode escape sequences in string

We have one text file that has the following text

"\u5b89\u5fbd\u5b5f\u5143"

When we read the containt file in C # .net, it shows how

 "\\u5b89\\u5fbd\\u5b5f\\u5143"

Our decoder method

 public string Decoder(string value) { Encoding enc = new UTF8Encoding(); byte[] bytes = enc.GetBytes(value); return enc.GetString(bytes); }

When I compress the code value

 string Output=Decoder("\u5b89\u5fbd\u5b5f\u5143");

It works well, but when we use a variable value, this time does not work.

When we use the string we get from the text file

  value=(text file containt) string Output=Decoder(value);

It returns the wrong output.

Please help me solve the problem.

+4

c # .net

Prateek saluja Mar 16 '12 at 13:37

source share

6 answers

So your file contains a shorthand line

 \u5b89\u5fbd\u5b5f\u5143

in ASCII, and not in the string represented by these four Unicode codes in some given encoding?

Be that as it may, I just wrote C # code that can parse strings in this format for a JSON parser project - here's an option that only handles \ uXXXX:

 private static string ReadSlashedString(TextReader reader) { var sb = new StringBuilder(32); bool q = false; while (true) { int chrR = reader.Read(); if (chrR == -1) break; var chr = (char) chrR; if (!q) { if (chr == '\\') { q = true; continue; } sb.Append(chr); } else { switch (chr) { case 'u': case 'U': var hexb = new char[4]; reader.Read(hexb, 0, 4); chr = (char) Convert.ToInt32(new string(hexb), 16); sb.Append(chr); break; default: throw new Exception("Invalid backslash escape (\\ + charcode " + (int) chr + ")"); } q = false; } } return sb.ToString(); }

and you can use it as

 var str = ReadSlashedString(new StringReader("\\u5b89\\u5fbd\\u5b5f\\u5143"));

(or using StreamReader to read from a file).

Hope this helps!

EDIT: @Darin Dimitrov using a regular answer is probably faster, but I had this code on hand. :)

+3

AKX Mar 16 '12 at 13:47

source share

Use below code that cancels any esapces character from input line

 Regex.Unescape(value);

+2

Sagar May 14, '14 at 8:50

source share

UTFEncoding (or any other encoding) does not translate escape sequences such as \u5b89 into the corresponding character.

The reason that it works when passing a string constant is because the C # compiler interprets the escape sequences and translates them in the corresponding character before calling the decoder (in fact, even before the program is executed ...).

You need to write code that recognizes escape sequences and convert them to the appropriate characters.

0

Mimo Mar 16 '12 at 13:44

source share

When you read "\u5b89\u5fbd\u5b5f\u5143" , you get exactly what you are reading. The sender distracts your lines before displaying them. The double backslash in a string is the single backslashes that have been escaped.

When you pass a solid value, you are not actually viewing what you see on the screen. You are passing four Unicode characters since the C # line does not have a compiler.

Darin has already sent a way to unescape Unicode characters from a file, so I will not repeat it.

0

Kendall frey Mar 16 '12 at 13:48

source share

I think this will give you some idea.

  string str = "ivandro\u0020"; str = str.Trim();

If you try to print a line, you will notice that the space removed

-1

Ivandro ismael Jun 08 '14 at 1:38

source share

Darin Dimitrov · Accepted Answer · 2012-03-16T13:46:14+0000

You can use regular expression to parse a file:

 private static Regex _regex = new Regex(@"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled); public string Decoder(string value) { return _regex.Replace( value, m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString() ); }

and then:

 string data = Decoder(File.ReadAllText("test.txt"));

Replace unicode escape sequences in string

More articles: