C #: Replace single / s * with one * \

I found a lot of posts answering such questions (for example, "How to replace \ with / " or "How to replace \\ with \ "). I understand all this, but none of them solves my special problem. There she is:

I am reading a path string from the registry that contains " /// " instead of " \ " (easy to replace), but also " /u00xy " for Unicode characters. To successfully parse a string with the correct Unicode character, I have to replace / with one \ . But every possible way ( string.replace or regex.replace ) leads to " \\u00xy " instead of " \u00xy "!

Or, I get the error " Unrecognized escape sequence ". For instance:

 str.Replace("/u00", @"\u00") // results in "\\u00" 

While:

 str.Replace("/u00", "\u00") // gives an error. 

I have no ideas!

+4
source share
2 answers

I believe something like this should work for you:

 using System; using System.Collections.Generic; using System.Linq; using System.Text.RegularExpressions; using System.Globalization; namespace Test { public class Program { public static void Main(string[] args) { Console.WriteLine(ConvertUnicodeEscapes("aa/u00C4bb/u00C4cc/u00C4dd/u00C4ee")); // prints aaÄbbÄccÄddÄee } private static Regex r = new Regex("/u([0-9A-F]{4})"); private static string ConvertUnicodeEscapes(string input) { return r.Replace(input, m => { int code = int.Parse(m.Groups[1].Value, NumberStyles.HexNumber); return char.ConvertFromUtf32(code).ToString(); } ); } } } 

As John noted, this is not just replacing "/" with one "\". You cannot do this because "\" is an escape character. First, we must map the "/ uXXXX" groups. Then we convert the hexadecimal part of the string (XXXX) to an integer in utf32 (i.e. Unicode). Finally, we get the character corresponding to this utf32 code.

+2
source

EDIT: Now I understand what you're trying to do, it’s not at all surprising that it works. You are not talking about the “internal” representation of a string — you are really asking for C # syntax string literal rules to be applied at runtime.

If you write:

 string x = "\u0041"; 

... which creates a string containing one character ("A"). The fact that in the source code was presented as a Unicode escape sequence does not affect the string. Thus, the specified code is indistinguishable at runtime from:

 string x = "A"; 

Now it sounds like you want to parse a line containing a slash, then u , and then four hexadecimal digits into one character. You will need to do this yourself or find another library that will do this - you should not expect string.Replace do this for you.

In other words, it is important to understand the difference between the data itself and the presentation of the source code data.


You state:

 str.Replace("/u00", @"\u00") 

leads to "\ u00"

No, it really is. If you print the results to the console, you will see only one backslash.

I strongly suspect that you are looking in a debugger that shows an escaped view.

Demo Code:

 using System; class Test { static void Main() { string input = "x/u00y"; string output = input.Replace("/u00", @"\u00"); Console.WriteLine(output); // Result: x\u00y } } 

This code:

 str.Replace("/u00", "\u00") 

really fails because the string literal "\u00" invalid. This is an inexhaustible sequence of Unicode character characters.

+2
source

Source: https://habr.com/ru/post/1444162/


All Articles