Using Regex Replace when searching for unescaped characters

I have a requirement, which is basically it. If I have a line of text like

"There once was an 'ugly' duckling but it could never have been \'Scarlett\' Johansen" 

then I would like to match quotes that have not yet been escaped. It would be those who were ugly, not Scarlett.

I spent quite a bit of time using this small C # console application to test things and developed the following solution.

 private static void RegexFunAndGames() { string result; string sampleText = @"Mr. Grant and Ms. Kelly starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then"; string rePattern = @"\\'"; string replaceWith = "'"; Console.WriteLine(sampleText); Regex regEx = new Regex(rePattern); result = regEx.Replace(sampleText, replaceWith); result = result.Replace("'", @"\'"); Console.WriteLine(result); } 

Basically, what I did was a two-step process to find those characters that were already escaped, cancel, and then do it all over again. That sounds a little awkward, and I feel that there might be a better way.

Testing Information

I have two really good answers, so I thought it was worth running a test to see what works better. I have two functions:

  private static string RegexReplace(string sampleText) { Regex regEx = new Regex("(?<!\\\\)'"); return regEx.Replace(sampleText, "\\'"); } private static string ReplaceTest(string sampleText) { return sampleText.Replace(@"\'", "'").Replace("'", @"\'"); } 

And I call them through the Main method in a console application:

  static void Main(string[] args) { string sampleText = @"Mr. Grant and Ms. Kelly starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then."; string testReplace = string.Empty; System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch(); sw.Start(); for (int i = 1000000; i > 0; i--) { testReplace = ReplaceTest(sampleText); } sw.Stop(); Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'"); sw.Reset(); sw.Start(); for (int i = 1000000; i > 0; i--) { testReplace = RegexReplace(sampleText); } sw.Stop(); Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'"); } 

The ReplaceTest method takes 2068 milliseconds. The RegexReplace method takes 9372 milliseconds. I checked this test several times and ReplaceTest always works the fastest.

+4
source share
3 answers

You can use a negative lookbehind to make sure the quote is not escaped: expression below

 (?<!\\)' 

matches a single quote if it is not preceded by a slash.

Please note that slashes that are included in string constants must be doubled.

 var sampleText = @"Mr. Grant and Ms. Kelly starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then"; var regEx = new Regex("(?<!\\\\)'"); var result = regEx.Replace(sampleText, "\\'"); Console.WriteLine(result); 

Prints above

 Mr. Grant and Ms. Kelly starred in the film \'To Catch A Thief\' but not in \'Stardust\' because they\'d stopped acting by then 

Link to ideon.

+3
source

I am surprised why you are using RegEx for this, so just don't use:

 string result = sampleText.Replace(@"\'", "'").Replace("'", @"\'"); 

This will lead to the disappearance of all the unoccupied. '

First, all escape sequences ' (single quote) will be displayed, and then all .

Well, if RegEx is the requirement , you will agree to make the right decision, as you said.

+3
source

you can use

  string rePattern = @"[\\'|\']"; 

Instead

-1
source

Source: https://habr.com/ru/post/1446496/


All Articles