How to simplify this regex to avoid recursion?
Regex:
(?|`(?>[^`\\]|\\.|``)*`|'(?>[^'\\]|\\.|'')*'|"(?>[^"\\]|\\.|"")*"|(\?{1,2})|(:{1,2})([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*))
Input Example:
INSERT INTO xyz WHERE
a=?
and b="what?"
and ??="cheese"
and `col?`='OK'
and ::col='another'
and last!=:least
https://regex101.com/r/HnTVXx/6
It should match ?, ??, :xyzand ::xyz, but not if they are located within the string with inverse quote, the line with double quotation marks or a quotation mark.
When I try to run this in PHP with very large input, I get PREG_RECURSION_LIMIT_ERRORfrom preg_last_error().
How can I simplify this regex pattern so that it doesn't do so much recursion?
Here is some test code that shows an error in PHP using the optimized Niet regular expression : https://3v4l.org/GdtmP Error Code 6 - PREG_JIT_STACKLIMIT_ERROR. The other I saw is 3 =PREG_RECURSION_LIMIT_ERROR
, ( (*SKIP)(*F)), 2 :
2 - : .
, . , , , . , .
, - : "...|'...|`...|:... :
(?=["'`:])(?:"...|'...|`...|:...)
["'`:](?:(?<=")...|(?<=')...|(?<=`)...|(?<=:)...)
, , ["'`:], , .
, - : " (?:[^"\\]|\\.)* " :
" [^"\\]* (?: \\. [^"\\]* )* "
, :
basic
2 , :
~
[`'"?:]
(?:
(?<=`) [^`\\]*+ (?s:\\.[^`\\]*|``[^`\\]*)*+ ` (*SKIP) (*F)
|
(?<=') [^'\\]*+ (?s:\\.[^'\\]*|''[^'\\]*)*+ ' (*SKIP) (*F)
|
(?<=") [^"\\]*+ (?s:\\.[^"\\]*|""[^"\\]*)*+ " (*SKIP) (*F)
|
(?<=\?) \??
|
(?<=:) :? ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)
)
~x
: , ( ), , . :
\G (all I don't want) (*SKIP) \K (what I am looking for)
\G - , , . , . ( ) .
:
~
[^`'"?:]*
(?:
` [^`\\]*+ (?s:\\.[^`\\]*|``[^`\\]*)*+ ` [^`'"?:]*
|
' [^'\\]*+ (?s:\\.[^'\\]*|''[^'\\]*)*+ ' [^`'"?:]*
|
" [^"\\]*+ (?s:\\.[^"\\]*|""[^"\\]*)*+ " [^`'"?:]*
)*
\K # only the part of the match after this position is returned
(*SKIP) # if the next subpattern fails, the contiguity is broken at this position
(?:
\?{1,2}
|
:{1,2} ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)
)
~Ax
" , " :
(don't match this(*SKIP)(*FAIL)|match this)
- ...
(
(['"`]) # capture this quote character
(?:\\.|(?!\1).)*+ # any escaped character, or
# any character that isn't the captured one
\1 # the captured quote again
(*SKIP)(*FAIL) # ignore this
|
\?\?? # one or two question marks
|
::?\w+ # word characters marked with one or two colons
)x