Why does my PHP regular expression parse Markdown faults?
$pattern = "/\[(.*?)\]\((.*?)\)/i"; $replace = "<a href=\"$2\" rel=\"nofollow\">$1</a>"; $text = "blah blah [LINK1](http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?"; echo preg_replace($pattern, $replace, $text); The above works, but if a space is accidentally inserted between [] and (), everything breaks and the two links are mixed into one:
$text = "blah blah [LINK1] (http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?"; I have a feeling that this is a star that breaks it, but does not know how to combine duplicate links.
If I understand correctly, everything you need to do also matches any number of spaces between them, for example:
/\[([^]]*)\] *\(([^)]*)\)/i Explanation:
\[ # Matches the opening square bracket (escaped) ([^]]*) # Captures any number of characters that aren't close square brackets \] # Match close square bracket (escaped) * # Match any number of spaces \( # Match the opening bracket (escaped) ([^)]*) # Captures any number of characters that aren't close brackets \) # Match the close bracket (escaped) Justification:
I should probably justify that the reason I changed yours .*? on [^]]*
The second version is more efficient because it does not need to do the huge amount of backtracking that it does .*? . In addition, after the discovery [ , version .*? will continue to search until it finds a match, instead of failing if it is not the tag that we need. For example, if we match the expression with .*? against:
Sad face :[ blah [LINK1](http://sub.example.com/) blah he will match
[ blah [LINK1] and
http://sub.example.com/ Using the approach [^]]* means that the input is correctly matched.