When a string matches a regular expression, what happens behind the scenes?

I would be interested to know which algorithms are used to match it and how they are optimized, because I imagine that somes regexes can create a huge number of possible matches that can cause serious problems with a poorly-worn regular expression parser.

Also, I recently discovered the ReDoS concept, why are regular expressions like (a|aa)+ or (a|a?)+ Causing problems?

EDIT: I used them most in C # and Python, so that was in my mind when I considered the question. I assume Python is written in C like all other interpreters, but I have no idea about C #

+6
source share
3 answers

There are two kinds of regex mechanism: NFA and DFA. I'm pretty rusty, so I dare not go into details from memory. Here is a page that goes through the algorithms. Some parsers will work better with poorly written expressions. A good book on the topic (which sits on my shelf) Mastering the regular expression .

0
source

I find http://www.regular-expressions.info has really useful information about regular expressions.

In particular, the author talks about the catastrophic use of regular expressions .

+2
source

Regex Buddy has this debug page, which "offers you a unique look inside the regex engine."

http://www.regexbuddy.com/debug.html

+1
source

Source: https://habr.com/ru/post/889986/


All Articles