Regex negative look ahead to match pegging links

We are faced with the problem of regular expression.

Here is the problem. Consider the following two patterns:

1) [hello] [world]

2) [hello [world]]

We need to write a regular expression that can only match [world] in the first and the entire pattern ( [hello [world]] ) in the second.

Using a negative lookahead, I wrote the following regular expression that solves part of the problem:

 \[[^\[\]]+\](?!.*\[[^\[\]]+\]) 

This regular expression matches pattern 1) as we want, but does not work for pattern 2).

+5
source share
3 answers

In .NET regex, you can use balanced groups to match nested balanced brackets. So, to match the last substring [...] (with parentheses) on the line, you need a rather long pattern, for example

 \[(?:[^][]+|(?<c>)\[|(?<-c>)])*(?(c)(?!))](?!.*\[(?:[^][]+|(?<d>)\[|(?<-d>)])*(?(d)(?!))]) 

See the regex demo at RegexStorm.net .

More details

  • \[(?:[^][]+|(?<c>)\[|(?<-c>)])*(?(c)(?!))] - a substring [...] with nested brackets:
    • \[ - a [ char
    • (?:[^][]+|(?<c>)\[|(?<-c>)])* - zero or more cases:
      • [^][]+| - 1 or more characters except ] and [ or
      • (?<c>)\[| is an empty value added to group "c", and [ matches
      • (?<-c>)] - an empty value is subtracted from the stack group "c", and ] corresponds
    • (?(c)(?!)) - a condition that does not match if the stack of group "c" is not empty
    • ] - a ] char
  • (?!.*\[(?:[^][]+|(?<d>)\[|(?<-d>)])*(?(d)(?!))]) - not there should be no 0+ characters other than newlines with the same pattern as above.
+2
source

Here is another possible solution to match all markup links if “properly” escaped.

Here's the regex:

 \[(?<text>(?:[^\[\]]|\\\[|\\\])+?)\]\((?<link>.+?)\) 

See the regex 101 demo .

Note that this does not support NOT shielded brackets inside links:

 [link number \[2]](http://myurl.com) [link number [2\]](http://myurl.com) 

It may also not support other cases of cross ...

0
source

An easier way to find the last balanced square bracket in a line with the .net-regex extension is to look for the line from right to left using the Regex.RightToLeft property. This way you avoid:

  • to search the entire string for nothing
  • to check the end of the line with lookahead, since the pattern returns the first match on the right.

code:

 string input = @"[hello] [world] [hello [world\]] ]"; string rtlPattern = @"(?(c)(?!))\[(?>\\.|(?<!\\)[^][]+|(?<-c>)\[|(?<c>)])*]"; Match m; m = Regex.Match(input, rtlPattern, RegexOptions.RightToLeft); if (m.Success) Console.WriteLine("Result: {0}", m.Groups[0].Value); 

demo

Please note that in order to understand what is happening, you also need to read parts of the template from right to left. Details:

 ] # a literal closing square bracket (?> # open an atomic group (*) \\. # any escaped character with a backslash | [^][]+ # all that isn't a square bracket (?<!\\) # not preceded by a backslash | (?<-c>) \[ # decrement the c stack for an opening bracket | (?<c>) ] # increment the c stack for a closing bracket )* # repeat zero or more times \[ # a literal square opening bracket (?(c) # conditional statement: true if c isn't empty (?!) # always failing pattern: "not followed by nothing" ) 

(*) Please note that the use of an atomic group is mandatory here in order to avoid a possible catastrophic return, since the group contains an element with a quantifier + and repeats itself. You can learn more about this issue here .

This pattern already applies to escaped nested brackets, and you can also add the Regex.Singleline property if you want to combine the part containing the newline character.

0
source

Source: https://habr.com/ru/post/1272775/


All Articles