What does "two-level regular expressions" mean?

I understand the basic regex, but donโ€™t know what the quote below means (regarding how to implement the wiki parser), can anyone provide some pseudo code to enlighten me?

Two-Level Regular Expressions

This is a very popular approach. This is pretty fast because it scans the source exactly two times.

The idea is to create two types of regular expressions: one division of the text into blocks of different types (paragraphs, headings, lists, preformatted blocks, etc.), and then process each of them using the regular regular character level expression.

Quote: http://www.wikicreole.org/wiki/CommonWikiParsingTechniques

+4
source share
2 answers

This means that you are not trying to perform several tasks in one regular expression, but divide it into two tasks (two levels); first splitting, then processing each token separately.

My opinion is that people are often reluctant to try to do too much Regex alone, instead of making things a lot easier by breaking up various tasks like this.

+5
source

It seems that โ€œtwo-level regular expressionsโ€ are a (slightly ambiguous) term for what I recommended in a few answers here in StackOverflow for analyzing a slightly complex (but still regular) language problem.

An example is getting all urls img src= from an HTML page. It is possible (but rather dirty) to do this all in one regex; which makes sense to use a regular expression to get all the <img> tags (capturing the entire tag), and then use another regular expression to get src="http://some-url-here.com" from each match. This makes the code more readable and you only look at the text twice.

+3
source

Source: https://habr.com/ru/post/1391491/


All Articles