Escaping \ x from strings

Well, I got this little method:

static string escapeString(string str) { string s = str.Replace(@"\r", "\r").Replace(@"\n", "\n").Replace(@"\t", "\t"); Regex regex = new Regex(@"\\x(..)"); var matches = regex.Matches(s); foreach (Match match in matches) { s = s.Replace(match.Value, ((char)Convert.ToByte(match.Value.Replace(@"\x", ""), 16)).ToString()); } return s; } 

It replaces "\ x65" with String, which I have in args [0].

But my problem: "\\ x65" will also be replaced, so I get "\ e". I tried to figure out a regex that will check if there is another backslash, but I'm out of luck.

Can anyone give a hint?

+4
source share
2 answers

You can continue to rewind regular expressions along with things like "\ s | \ w \ x (..)" to remove the case with \ x65. Obviously, this will be fragile since there is no guarantee that your sequence \ x65 always has a space or a character in front of it. This may be the beginning of the file. Also, your regex will match \ xTT, which is obviously not unicode. Consider replacing '.' with a character class like "\ x ([0-9a-f] {2})".

If it were a school project, I would do something like the following. You can replace all combinations of "\" with another unlikely sequence, for example, "@ !! @ !! @", run the regular expression and replacements, and then replace the entire unlikely sequence with "\". For instance:

 String s = inputString.Replace(@"\\", @" _@ !!@ !!@ _"); // do all of the regex, replacements, etc here String output = s.Replace(@" _@ !!@ !!@ _", @"\"); 

However, you should not do this in production code, because if your input stream ever has a magic sequence, then you will get additional backslashes.

Obviously, you are writing as if an interpolator. I feel obligated to recommend learning something more solid, like lexers who use regular expressions to create finite machines. The Wiki has great articles on this topic and I am a big fan of ANTLR. Now it may be a reevaluation, but if you continue to face these special cases, consider solving your problem in a more general way.

Start reading here for theory: http://en.wikipedia.org/wiki/Lexical_analysis

0
source

Use a negative look:

 Regex regex = new Regex(@"(?<!([^\]|^)\\)\\x(..)"); 

This states that the previous character is not a backslash, but without capturing the previous character (no traversal is performed).

0
source

Source: https://habr.com/ru/post/1399814/


All Articles