How can regex ignore escape quotes when matching strings?

I am trying to write a regular expression that will match everyone, but an apostrophe that has not been escaped. Consider the following:

<?php $s = 'Hi everyone, we\'re ready now.'; ?> 

My goal is to write a regular expression that will essentially match part of the string of this. I think of something like

 /.*'([^']).*/ 

to match a simple string, but I was trying to figure out how to get a negative lookbehind working on this apostrophe to ensure that it is not preceded by a backslash ...

Any ideas?

- JMT

+5
source share
6 answers
 <?php $backslash = '\\'; $pattern = <<< PATTERN #(["'])(?:{$backslash}{$backslash}?+.)*?{$backslash}1# PATTERN; foreach(array( "<?php \$s = 'Hi everyone, we\\'re ready now.'; ?>", '<?php $s = "Hi everyone, we\\"re ready now."; ?>', "xyz'a\\'bc\\d'123", "x = 'My string ends with with a backslash\\\\';" ) as $subject) { preg_match($pattern, $subject, $matches); echo $subject , ' => ', $matches[0], "\n\n"; } 

prints

 <?php $s = 'Hi everyone, we\'re ready now.'; ?> => 'Hi everyone, we\'re ready now.' <?php $s = "Hi everyone, we\"re ready now."; ?> => "Hi everyone, we\"re ready now." xyz'a\'bc\d'123 => 'a\'bc\d' x = 'My string ends with with a backslash\\'; => 'My string ends with with a backslash\\' 
+3
source

Here is my solution with test cases:

 /.*?'((?:\\\\|\\'|[^'])*+)'/ 

And mine (Perl, but I don't use any Perl-specific functions that I don't think):

 use strict; use warnings; my %tests = (); $tests{'Case 1'} = <<'EOF'; $var = 'My string'; EOF $tests{'Case 2'} = <<'EOF'; $var = 'My string has it\ challenges'; EOF $tests{'Case 3'} = <<'EOF'; $var = 'My string ends with a backslash\\'; EOF foreach my $key (sort (keys %tests)) { print "$key...\n"; if ($tests{$key} =~ m/.*?'((?:\\\\|\\'|[^'])*+)'/) { print " ... '$1'\n"; } else { print " ... NO MATCH\n"; } } 

Doing this shows:

 $ perl a.pl Case 1... ... 'My string' Case 2... ... 'My string has it\ challenges' Case 3... ... 'My string ends with a backslash\\' 

Note that the initial template at the beginning must be non-living. Then I use matches without a backtrack to gobble up \\ and \ ', and then everything that is not a single quote character.

I think this one probably mimics the compiler's built-in approach, which should make it pretty bulletproof.

+3
source
 /.*'([^'\\]|\\.)*'.*/ 

Brackets contain information about non-apostrophes / backslashes and backslash characters. If only some characters can be escaped, change the value \\. on \\['\\az] or something else.

+2
source

Through negative appearance:

 / .*?' #Match until ' ( .*? #Lazy match & capture of everything after the first apostrophe ) (?<!(?<!\\)\\)' #Match first apostrophe that isn't preceded by \, but accept \\ .* #Match remaining text / 
0
source
 Regex reg = new Regex("(?<!\\\\)'(?<string>.*?)(?<!\\\\)'"); 
0
source

This is for JavaScript:

/('|")(?:\\\\|\\\1|[\s\S])*?\1/

...

  • matches strings with one or two quotation marks
  • matches empty strings (length 0)
  • matches strings with embedded spaces ( \n , \t , etc.)
  • skips internal escaped quotes (single or double)
  • skips single quotes in double quotes and vice versa

Only the first quote is captured. You can write a string without quotes in $ 2 with:

/('|")((?:\\\\|\\\1|[\s\S])*?)\1/

0
source

Source: https://habr.com/ru/post/891392/


All Articles