Regex URL Replace, ignore images and existing links

Question

Regex URL Replace, ignore images and existing links

I have a very good regex that works and is capable of replacing URLs in a single line to click once.

string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])";

Now, how can I say to ignore already available links and images?

Therefore, it ignores the lines below:

 <a href="http://www.someaddress.com">Some Text</a> <img src="http://www.someaddress.com/someimage.jpg" />

Example:

 The website www.google.com, once again <a href="http://www.google.com">www.google.com</a>, the logo <img src="http://www.google.com/images/logo.gif" />

Result:

 The website <a href="http://www.google.com">www.google.com</a>, once again <a href="http://www.google.com">www.google.com</a>, the logo <img src="http://www.google.com/images/logo.gif" />

Full Parser Code for HTML:

 string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])"; Regex r = new Regex(regex, RegexOptions.IgnoreCase); text = r.Replace(text, "<a href=\"$1\" title=\"Click to open in a new window or tab\" target=\"&#95;blank\" rel=\"nofollow\">$1</a>").Replace("href=\"www", "href=\"http://www"); return text;

+1

c # regex

Cindro Feb 21 '12 at 8:19

source share

2 answers

Check out: Detect email in text using regular expression , just replace regular expression for links, it will never replace link inside tag only in content.

http://htmlagilitypack.codeplex.com/

Sort of:

 string textToBeLinkified = "... your text here ..."; const string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])"; Regex urlExpression = new Regex(regex, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(textToBeLinkified); var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]") ?? new HtmlNodeCollection(); foreach (var node in nodes) { node.InnerHtml = urlExpression.Replace(node.InnerHtml, @"<a href=""$0"">$0</a>"); } string linkifiedText = doc.DocumentNode.OuterHtml;

+1

jessehouwing Feb 21 '12 at 16:44

source share

Sam greenhalgh · Accepted Answer · 2012-02-21T12:54:34+0000

First, I will send him a mandatory link if no one else will. Open RegEx tags, with the exception of standalone XHTML tags

How to use a negative lookahead / behind for " as follows:

string regex = @"(?<!"")((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])(?!"")";

Regex URL Replace, ignore images and existing links

More articles: