Nicolas Carey correctly directs you to RFC-3986. The regular expression that it specifies will match the common URI, but it wonβt check for it (and this regular expression is not suitable for selecting URLs from the "wild" - it is too loose and matches almost any string, including an empty string).
Regarding the validation requirement, you can take a look at the article I wrote on this subject, which took from Appendix A all the ABNF syntactic definitions of all the various components and provides equivalent regular expressions:
Validating a URI
Regarding the question of picking URLs from the wild, look at Jeff Atwood's β URL Problem β and John 'Gruber's β Improved Liberal, Accurate Regular Expression Pattern for Matching URLs β to get an idea of ββsome of the subtle issues that may arise. Alternatively, you can take a look at the project I started last year: URL Linkification - This displays unconnected HTTP and FTP URLs from text that some links may already have.
However, the following PHP function, which uses a slightly modified version of the RFC-3986 "Absolute URI" regular expression to validate HTTP and FTP URLs (with this regular expression, the specified host part should not be empty). All of the various components of the URI are isolated and captured in named groups, making it easy to manipulate and verify details in program code:
function url_valid($url) { if (strpos($url, 'www.') === 0) $url = 'http://'. $url; if (strpos($url, 'ftp.') === 0) $url = 'ftp://'. $url; if (!preg_match('/# Valid absolute URI having a non-empty, valid DNS host. ^ (?P<scheme>[A-Za-z][A-Za-z0-9+\-.]*):\/\/ (?P<authority> (?:(?P<userinfo>(?:[A-Za-z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*)@)? (?P<host> (?P<IP_literal> \[ (?: (?P<IPV6address> (?: (?:[0-9A-Fa-f]{1,4}:){6} | ::(?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3} | (?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2} | (?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4}: | (?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?:: ) (?P<ls32>[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4} | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?:: ) | (?P<IPvFuture>[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&\'()*+,;=:]+) ) \] ) | (?P<IPv4address>(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)) | (?P<regname>(?:[A-Za-z0-9\-._~!$&\'()*+,;=]|%[0-9A-Fa-f]{2})+) ) (?::(?P<port>[0-9]*))? ) (?P<path_abempty>(?:\/(?:[A-Za-z0-9\-._~!$&\'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*) (?:\?(?P<query> (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? (?:\#(?P<fragment> (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? $ /mx', $url, $m)) return FALSE; switch ($m['scheme']) { case 'https': case 'http': if ($m['userinfo']) return FALSE;
The first regular expression checks the string as an absolute (has a non-empty part of the host) common URI. The second regular expression is used to check part of the host (named) host (if it is not an IP literal or IPv4 address) against a DNS lookup system (where each dotted subdomain has 63 characters or less, consisting of numbers, letters and dashes with a common less than 255 characters long.)
Please note that the structure of this function allows easy expansion to include other schemes.