Regex negative look ahead to match pegging links

Question

Regex negative look ahead to match pegging links

We are faced with the problem of regular expression.

Here is the problem. Consider the following two patterns:

1) [hello] [world]

2) [hello [world]]

We need to write a regular expression that can only match [world] in the first and the entire pattern ( [hello [world]] ) in the second.

Using a negative lookahead, I wrote the following regular expression that solves part of the problem:

 \[[^\[\]]+\](?!.*\[[^\[\]]+\])

This regular expression matches pattern 1) as we want, but does not work for pattern 2).

+5

regex .net regex-lookarounds

Enrico massone Oct 20 '17 at 13:03

source share

3 answers

Wiktor stribiżew · Answer 1 · 2017-10-20T13:11:00+0000

In .NET regex, you can use balanced groups to match nested balanced brackets. So, to match the last substring [...] (with parentheses) on the line, you need a rather long pattern, for example

 \[(?:[^][]+|(?<c>)\[|(?<-c>)])*(?(c)(?!))](?!.*\[(?:[^][]+|(?<d>)\[|(?<-d>)])*(?(d)(?!))])

See the regex demo at RegexStorm.net .

More details

\[(?:[^][]+|(?<c>)\[|(?<-c>)])*(?(c)(?!))] - a substring [...] with nested brackets:
- \[ - a [ char
- (?:[^][]+|(?<c>)\[|(?<-c>)])* - zero or more cases:
  - [^][]+| - 1 or more characters except ] and [ or
  - (?<c>)\[| is an empty value added to group "c", and [ matches
  - (?<-c>)] - an empty value is subtracted from the stack group "c", and ] corresponds
- (?(c)(?!)) - a condition that does not match if the stack of group "c" is not empty
- ] - a ] char
(?!.*\[(?:[^][]+|(?<d>)\[|(?<-d>)])*(?(d)(?!))]) - not there should be no 0+ characters other than newlines with the same pattern as above.

Davide Icardi · Answer 2 · 2017-10-20T16:05:39+0000

Here is another possible solution to match all markup links if “properly” escaped.

Here's the regex:

 \[(?<text>(?:[^\[\]]|\\\[|\\\])+?)\]\((?<link>.+?)\)

See the regex 101 demo .

Note that this does not support NOT shielded brackets inside links:

 [link number \[2]](http://myurl.com) [link number [2\]](http://myurl.com)

It may also not support other cases of cross ...

Casimir et Hippolyte · Answer 3 · 2017-10-20T21:27:37+0000

An easier way to find the last balanced square bracket in a line with the .net-regex extension is to look for the line from right to left using the Regex.RightToLeft property. This way you avoid:

to search the entire string for nothing
to check the end of the line with lookahead, since the pattern returns the first match on the right.

code:

 string input = @"[hello] [world] [hello [world\]] ]"; string rtlPattern = @"(?(c)(?!))\[(?>\\.|(?<!\\)[^][]+|(?<-c>)\[|(?<c>)])*]"; Match m; m = Regex.Match(input, rtlPattern, RegexOptions.RightToLeft); if (m.Success) Console.WriteLine("Result: {0}", m.Groups[0].Value);

demo

Please note that in order to understand what is happening, you also need to read parts of the template from right to left. Details:

 ] # a literal closing square bracket (?> # open an atomic group (*) \\. # any escaped character with a backslash | [^][]+ # all that isn't a square bracket (?<!\\) # not preceded by a backslash | (?<-c>) \[ # decrement the c stack for an opening bracket | (?<c>) ] # increment the c stack for a closing bracket )* # repeat zero or more times \[ # a literal square opening bracket (?(c) # conditional statement: true if c isn't empty (?!) # always failing pattern: "not followed by nothing" )

_{(*) Please note that the use of an atomic group is mandatory here in order to avoid a possible catastrophic return, since the group contains an element with a quantifier + and repeats itself.} _{You can learn more about this issue here .}

This pattern already applies to escaped nested brackets, and you can also add the Regex.Singleline property if you want to combine the part containing the newline character.

Regex negative look ahead to match pegging links

More articles: