The key difference is the *? part *? , which creates a reluctant quantifier , and so it tries to match as little as possible. The standard quantifier * is a greedy quantifier and tries to match as much as possible.
See Greedy vs. Grudging vs. Potential Quantifiers
As Set Robertson pointed out, you can use a regular expression that is independent of greedy / reluctant behavior. Indeed, you can write possessive regex for better performance:
<p\s*+[^>]*+>
Here \s*+ matches any number of spaces, and [^>]*+ matches any number of characters except > . Both quantifiers do not track in the event of a mismatch, which improves the execution time in the event of a mismatch, as well as for some regular expression implementations also in the event of a match (since internal backtracking data may be omitted).
Please note that if there are other tags starting with <p (they have not written HTML directly for a long time), you also agree to them. If you do not want this, use a regex:
<p(\s++[^>]*+)?>
This makes the entire section between <p and > optional.
source share