Why does my PHP regular expression parse Markdown faults?
$pattern = "/\[(.*?)\]\((.*?)\)/i"; $replace = "<a href=\"$2\" rel=\"nofollow\">$1</a>"; $text = "blah blah [LINK1](http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?"; echo preg_replace($pattern, $replace, $text);
The above works, but if a space is accidentally inserted between [] and (), everything breaks and the two links are mixed into one:
$text = "blah blah [LINK1] (http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?";
I have a feeling that this is a star that breaks it, but does not know how to combine duplicate links.
If I understand correctly, everything you need to do also matches any number of spaces between them, for example:
/\[([^]]*)\] *\(([^)]*)\)/i
Explanation:
\[ # Matches the opening square bracket (escaped) ([^]]*) # Captures any number of characters that aren't close square brackets \] # Match close square bracket (escaped) * # Match any number of spaces \( # Match the opening bracket (escaped) ([^)]*) # Captures any number of characters that aren't close brackets \) # Match the close bracket (escaped)
Justification:
I should probably justify that the reason I changed yours .*?
on [^]]*
The second version is more efficient because it does not need to do the huge amount of backtracking that it does .*?
. In addition, after the discovery [
, version .*?
will continue to search until it finds a match, instead of failing if it is not the tag that we need. For example, if we match the expression with .*?
against:
Sad face :[ blah [LINK1](http://sub.example.com/) blah
he will match
[ blah [LINK1]
and
http://sub.example.com/
Using the approach [^]]*
means that the input is correctly matched.