When a string matches a regular expression, what happens behind the scenes?

Question

When a string matches a regular expression, what happens behind the scenes?

I would be interested to know which algorithms are used to match it and how they are optimized, because I imagine that somes regexes can create a huge number of possible matches that can cause serious problems with a poorly-worn regular expression parser.

Also, I recently discovered the ReDoS concept, why are regular expressions like (a|aa)+ or (a|a?)+ Causing problems?

EDIT: I used them most in C # and Python, so that was in my mind when I considered the question. I assume Python is written in C like all other interpreters, but I have no idea about C #

+6

regex

espekia Jun 07 '11 at 18:57

source share

3 answers

I find http://www.regular-expressions.info has really useful information about regular expressions.

In particular, the author talks about the catastrophic use of regular expressions .

+2

marto Jun 07 '11 at 19:07

source share

Regex Buddy has this debug page, which "offers you a unique look inside the regex engine."

http://www.regexbuddy.com/debug.html

+1

manojlds Jun 07 '11 at 19:15

source share

Alex hirzel · Accepted Answer · 2011-06-07T19:10:24+0000

There are two kinds of regex mechanism: NFA and DFA. I'm pretty rusty, so I dare not go into details from memory. Here is a page that goes through the algorithms. Some parsers will work better with poorly written expressions. A good book on the topic (which sits on my shelf) Mastering the regular expression .

When a string matches a regular expression, what happens behind the scenes?

More articles: