Okie Dock! Without the "extra library" and the "quick and easy" here you go:
<(?<Tag_Name>(a)|img)\b[^>]*?\b(?<URL_Type>(?(1)href|src))\s*=\s*(?:"(?<URL>(?:\\"|[^"])*)"|'(?<URL>(?:\\'|[^'])*)')
or as a C # line:
@"<(?<Tag_Name>(a)|img)\b[^>]*?\b(?<URL_Type>(?(1)href|src))\s*=\s*(?:""(?<URL>(?:\\""|[^""])*)""|'(?<URL>(?:\\'|[^'])*)')"
This captures the tag name ( a or img ) in the Tag_Name group, the URL type ( href or src ) in the URL_Type group, and the URL in the URL group (I know, I got a few ads with group names).
It handles any type of quotation mark ( " or ' ), and although any type of quotation in the URL should already be encoded in essence, it will ignore any character with quotation marks \' and \" .
It does not ignore closed tags (therefore garbled HTML), it will find an opening for one of the tags, such as <a or img, and then continue to ignore everything except the larger (>) up until it finds the corresponding attribute URL type ( href for tags a and src for img tags), then match the contents. Then it exits and does not worry about the rest of the tag!
Let me know if you want me to break it for you, but here is a selection of matches made for this very page:
<Match> 'Tag' 'URL_Type' 'URL' ---------------------------------------- ----- ---------- ----------------------------- <a href="http://meta.stackoverflow.com" a href http:
Total of 140 tags found (I assume that additional posters will increase slightly)
source share