Replace unicode escape sequences in string

We have one text file that has the following text

"\u5b89\u5fbd\u5b5f\u5143" 

When we read the containt file in C # .net, it shows how

 "\\u5b89\\u5fbd\\u5b5f\\u5143" 

Our decoder method

 public string Decoder(string value) { Encoding enc = new UTF8Encoding(); byte[] bytes = enc.GetBytes(value); return enc.GetString(bytes); } 

When I compress the code value

 string Output=Decoder("\u5b89\u5fbd\u5b5f\u5143"); 

It works well, but when we use a variable value, this time does not work.

When we use the string we get from the text file

  value=(text file containt) string Output=Decoder(value); 

It returns the wrong output.

Please help me solve the problem.

+4
source share
6 answers

You can use regular expression to parse a file:

 private static Regex _regex = new Regex(@"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled); public string Decoder(string value) { return _regex.Replace( value, m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString() ); } 

and then:

 string data = Decoder(File.ReadAllText("test.txt")); 
+5
source

So your file contains a shorthand line

 \u5b89\u5fbd\u5b5f\u5143 

in ASCII, and not in the string represented by these four Unicode codes in some given encoding?

Be that as it may, I just wrote C # code that can parse strings in this format for a JSON parser project - here's an option that only handles \ uXXXX:

 private static string ReadSlashedString(TextReader reader) { var sb = new StringBuilder(32); bool q = false; while (true) { int chrR = reader.Read(); if (chrR == -1) break; var chr = (char) chrR; if (!q) { if (chr == '\\') { q = true; continue; } sb.Append(chr); } else { switch (chr) { case 'u': case 'U': var hexb = new char[4]; reader.Read(hexb, 0, 4); chr = (char) Convert.ToInt32(new string(hexb), 16); sb.Append(chr); break; default: throw new Exception("Invalid backslash escape (\\ + charcode " + (int) chr + ")"); } q = false; } } return sb.ToString(); } 

and you can use it as

 var str = ReadSlashedString(new StringReader("\\u5b89\\u5fbd\\u5b5f\\u5143")); 

(or using StreamReader to read from a file).

Hope this helps!

EDIT: @Darin Dimitrov using a regular answer is probably faster, but I had this code on hand. :)

+3
source

Use below code that cancels any esapces character from input line

 Regex.Unescape(value); 
+2
source

UTFEncoding (or any other encoding) does not translate escape sequences such as \u5b89 into the corresponding character.

The reason that it works when passing a string constant is because the C # compiler interprets the escape sequences and translates them in the corresponding character before calling the decoder (in fact, even before the program is executed ...).

You need to write code that recognizes escape sequences and convert them to the appropriate characters.

0
source

When you read "\u5b89\u5fbd\u5b5f\u5143" , you get exactly what you are reading. The sender distracts your lines before displaying them. The double backslash in a string is the single backslashes that have been escaped.

When you pass a solid value, you are not actually viewing what you see on the screen. You are passing four Unicode characters since the C # line does not have a compiler.

Darin has already sent a way to unescape Unicode characters from a file, so I will not repeat it.

0
source

I think this will give you some idea.

  string str = "ivandro\u0020"; str = str.Trim(); 

If you try to print a line, you will notice that the space removed

-1
source

Source: https://habr.com/ru/post/1401812/


All Articles