PHP regex for validating url

I am looking for a suitable regex to match a url (full url with a scheme, domain, path, etc.). I would usually use filter_var, but in this case I can not, since I need to support PHP <5.2!

I searched on the Internet, but I can’t find anything that I’m sure will be perfect, and all I can find on SO are people who say they use filter_var.

Does anyone have a regex that they use to do this?

My code (just so you can see what I'm trying to achieve):

function validate_url($url){ if (function_exists('filter_var')){ return filter_var($url, FILTER_VALIDATE_URL); } return preg_match(REGEX_HERE, $url); } 
+4
source share
4 answers

You can try this one . I have not tried it myself, but this is by far the biggest regular expression I have ever seen, haha.

 ^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?#Username:Password)(?:\w+:\ w+@ )?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[az]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:\/(?:[-\w~!$+|.,=]|%[af\d]{2})+)+|\/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[af\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[af\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[af\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[af\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[af\d]{2})*)?$ 
+1
source

I created a domain verification solution. Although it does not cover the entire URL, it is very detailed and specific. The question you need to ask yourself is: "Why am I checking the domain?" If you need to make sure that the domain really exists, you need to confirm the domain (including valid TLDs). The problem is that too many developers use the shortcut ([az] {2,4}) and I will call it good. If you are thinking along these lines, then why call it URL checking? Not this. It simply passes the URL through the regex.

I have an open source class that allows you to check the domain not only using a single source for TLD management (iana.org), but it will also check the domain through DNS records to make sure it really exists. DNS validation is optional, but the domain will definitely be valid based on the TLD.

For example: example.ay is NOT a valid domain because TLD.ay is not valid. But using the regular expression posted here ([az] {2,4}), it will pass. I have an affinity for quality. I am trying to express this in the code I am writing. Others may not bother. Therefore, if you just want to “verify” the URL, you can use the examples listed in these answers. If you really want to check the domain in the URL, you can have in the class I created to do just that. You can download it at: http://code.google.com/p/blogchuck/source/browse/trunk/domains.php

It checks based on RFCs that “manage” (using the term “freely”) to determine the valid domain. In short, here is what the domain class will do: Basic Domain Validation Rules

  • must be at least one character long
  • must begin with a letter or number
  • contains letters, numbers and hyphens
  • must end with a letter or number.
  • may contain multiple nodes (i.e. node1.node2.node3)
  • each node can be up to 63 characters long
  • common domain name can be no more than 255 characters
  • must end with a valid TLD
  • may be an IP4 address

It will also download a copy of the main iana.org TLD file only after checking your local copy. If your local copy is 30 days old, it will download a new copy. The TLDs in the file will be used in REGEX to verify the TLDs in the domain you are checking. This prevents the validation of .ay (and other invalid TLDs).

This is a long bit of code, but very compact considering what it does. And this is the most accurate. That is why I asked the question before. Do you want to perform a “check” or “check”?

+2
source

I saw a regex that could really validate a valid URL, but it was two pages long ...

You should probably parse the url with parse_url and then check if all of your required bits are ok.

Addendum: This is the debugging of my URL class:

 public static function IsUrl($test) { if (strpos($test, ' ') > -1) { return false; } if (strpos($test, '.') > 1) { $check = @parse_url($test); return is_array($check) && isset($check['scheme']) && isset($check['host']) && count(explode('.', $check['host'])) > 1 } return false; } 

He tests this line and requires some basics in the URL, namely that this scheme is installed and the host name has a dot in it.

+1
source
 !(https?://)?([-_a-z0-9]+\.)*([-_a-z0-9]+)\.([az]{2,4})(/?)(.*)!i 

I use this regex to validate URLs. So far, this has never failed me :)

0
source

Source: https://habr.com/ru/post/1303299/


All Articles