Why does my PHP regular expression parse Markdown faults?

Question

Why does my PHP regular expression parse Markdown faults?

$pattern = "/\[(.*?)\]\((.*?)\)/i"; $replace = "<a href=\"$2\" rel=\"nofollow\">$1</a>"; $text = "blah blah [LINK1](http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?"; echo preg_replace($pattern, $replace, $text);

The above works, but if a space is accidentally inserted between [] and (), everything breaks and the two links are mixed into one:

 $text = "blah blah [LINK1] (http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?";

I have a feeling that this is a star that breaks it, but does not know how to combine duplicate links.

+6

php regex markdown

user1070125 May 13 '12 at 11:27

source share

2 answers

Try the following:

 $pattern = "/\[(.*?)\]\s?\((.*?)\)/i";

\s? added between \[(.*?)\] and \((.*?)\)

0

Karo May 13 '12 at 11:31

source share

Jarmex · Accepted Answer · 2012-05-13T11:31:30+0000

If I understand correctly, everything you need to do also matches any number of spaces between them, for example:

 /\[([^]]*)\] *\(([^)]*)\)/i

Explanation:

 \[ # Matches the opening square bracket (escaped) ([^]]*) # Captures any number of characters that aren't close square brackets \] # Match close square bracket (escaped) * # Match any number of spaces \( # Match the opening bracket (escaped) ([^)]*) # Captures any number of characters that aren't close brackets \) # Match the close bracket (escaped)

Justification:

I should probably justify that the reason I changed yours .*? on [^]]*

The second version is more efficient because it does not need to do the huge amount of backtracking that it does .*? . In addition, after the discovery [ , version .*? will continue to search until it finds a match, instead of failing if it is not the tag that we need. For example, if we match the expression with .*? against:

 Sad face :[ blah [LINK1](http://sub.example.com/) blah

he will match

 [ blah [LINK1]

and

 http://sub.example.com/

Using the approach [^]]* means that the input is correctly matched.

Why does my PHP regular expression parse Markdown faults?

More articles: