[^"\\]|\\.|"")*"|(\?{1,2})|(:{1,2})([a...">

How to simplify this regex to avoid recursion?

Regex:

(?|`(?>[^`\\]|\\.|``)*`|'(?>[^'\\]|\\.|'')*'|"(?>[^"\\]|\\.|"")*"|(\?{1,2})|(:{1,2})([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*))

Input Example:

INSERT INTO xyz WHERE 
a=? 
and b="what?"
and ??="cheese"
and `col?`='OK'
and ::col='another'
and last!=:least

https://regex101.com/r/HnTVXx/6

It should match ?, ??, :xyzand ::xyz, but not if they are located within the string with inverse quote, the line with double quotation marks or a quotation mark.

When I try to run this in PHP with very large input, I get PREG_RECURSION_LIMIT_ERRORfrom preg_last_error().

How can I simplify this regex pattern so that it doesn't do so much recursion?


Here is some test code that shows an error in PHP using the optimized Niet regular expression : https://3v4l.org/GdtmP Error Code 6 - PREG_JIT_STACKLIMIT_ERROR. The other I saw is 3 =PREG_RECURSION_LIMIT_ERROR

+4
2

, ( (*SKIP)(*F)), 2 :

2 - : .

, . , , , . , .

, - : "...|'...|`...|:... :

(?=["'`:])(?:"...|'...|`...|:...)

["'`:](?:(?<=")...|(?<=')...|(?<=`)...|(?<=:)...)

, , ["'`:], , .


, - : " (?:[^"\\]|\\.)* " :

" [^"\\]* (?: \\. [^"\\]* )* "

, :
basic


2 , :

~
[`'"?:]
(?:
    (?<=`) [^`\\]*+ (?s:\\.[^`\\]*|``[^`\\]*)*+ ` (*SKIP) (*F)
  |
    (?<=') [^'\\]*+ (?s:\\.[^'\\]*|''[^'\\]*)*+ ' (*SKIP) (*F)
  |
    (?<=") [^"\\]*+ (?s:\\.[^"\\]*|""[^"\\]*)*+ " (*SKIP) (*F)
  |
    (?<=\?) \??
  |
    (?<=:) :? ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)
)
~x


: , ( ), , . :

\G (all I don't want) (*SKIP) \K (what I am looking for)

\G - , , . , . ( ) .

:

~
[^`'"?:]*
(?:
    ` [^`\\]*+ (?s:\\.[^`\\]*|``[^`\\]*)*+ ` [^`'"?:]*
  |
    ' [^'\\]*+ (?s:\\.[^'\\]*|''[^'\\]*)*+ ' [^`'"?:]*
  |
    " [^"\\]*+ (?s:\\.[^"\\]*|""[^"\\]*)*+ " [^`'"?:]*
)*
\K  # only the part of the match after this position is returned
(*SKIP) # if the next subpattern fails, the contiguity is broken at this position
(?:
    \?{1,2}
  |
    :{1,2} ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)
)
~Ax

+1

" , " :

(don't match this(*SKIP)(*FAIL)|match this)

- ...

(
  (['"`]) # capture this quote character
  (?:\\.|(?!\1).)*+ # any escaped character, or
                    # any character that isn't the captured one
  \1      # the captured quote again
  (*SKIP)(*FAIL)   # ignore this
  |
  \?\??   # one or two question marks
  |
  ::?\w+  # word characters marked with one or two colons
)x

https://regex101.com/r/HnTVXx/7

+3

Source: https://habr.com/ru/post/1673730/


All Articles