Regex URL Replace, ignore images and existing links

I have a very good regex that works and is capable of replacing URLs in a single line to click once.

string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])"; 

Now, how can I say to ignore already available links and images?

Therefore, it ignores the lines below:

 <a href="http://www.someaddress.com">Some Text</a> <img src="http://www.someaddress.com/someimage.jpg" /> 

Example:

 The website www.google.com, once again <a href="http://www.google.com">www.google.com</a>, the logo <img src="http://www.google.com/images/logo.gif" /> 

Result:

 The website <a href="http://www.google.com">www.google.com</a>, once again <a href="http://www.google.com">www.google.com</a>, the logo <img src="http://www.google.com/images/logo.gif" /> 

Full Parser Code for HTML:

 string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])"; Regex r = new Regex(regex, RegexOptions.IgnoreCase); text = r.Replace(text, "<a href=\"$1\" title=\"Click to open in a new window or tab\" target=\"&#95;blank\" rel=\"nofollow\">$1</a>").Replace("href=\"www", "href=\"http://www"); return text; 
+1
source share
2 answers

First, I will send him a mandatory link if no one else will. Open RegEx tags, with the exception of standalone XHTML tags


How to use a negative lookahead / behind for " as follows:

string regex = @"(?<!"")((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])(?!"")";

+2
source

Check out: Detect email in text using regular expression , just replace regular expression for links, it will never replace link inside tag only in content.

http://htmlagilitypack.codeplex.com/

Sort of:


 string textToBeLinkified = "... your text here ..."; const string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])"; Regex urlExpression = new Regex(regex, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(textToBeLinkified); var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]") ?? new HtmlNodeCollection(); foreach (var node in nodes) { node.InnerHtml = urlExpression.Replace(node.InnerHtml, @"<a href=""$0"">$0</a>"); } string linkifiedText = doc.DocumentNode.OuterHtml; 
+1
source

Source: https://habr.com/ru/post/905823/


All Articles