I know that I was late for this party, but there are several problems with regexing that existing answers are not addressed. The first and most annoying, there is that forest of backslashes. If you use C # shorthand lines, you don't need to do all this double escaping. In general, most backslashes are not needed in the first place.
Secondly, there is this bit: ([\\w+?\\.\\w+])+ . The square brackets form a character class, and everything inside them is treated as a literal character or an abbreviated class value like \w . But getting rid of the square brackets is not enough to make it work. I suspect this is what you tried: \w+(?:\.\w+)+ .
Third, the quantifiers at the end of the regular expression - ]*)? - incompatible. * may coincide with zero or more characters, so it makes no sense to make the inclusion group optional. In addition, such a scheme can lead to serious degradation of performance. See this page for more details.
There are other, minor issues, but I will not go into them right now. Here's a new and improved regex:
@"(?n)(https?|ftps?)://\w+(\.\w+)+([ -a-zA-Z0-9~!@ #$%^&*()_=+/?.:;',\\]*)(?![^<>]*+(>|</a>))"
A negative lookahead - (?![^<>]*+(>|</a>)) is what prevents matches within tags or contents of an anchor element. However, it is still very rude. There are several areas, for example, inside <script> elements where you do not want them to match, but this happens. But trying to cover all the possibilities would lead to a regular expression of a mile long.
source share