I created a domain verification solution. Although it does not cover the entire URL, it is very detailed and specific. The question you need to ask yourself is: "Why am I checking the domain?" If you need to make sure that the domain really exists, you need to confirm the domain (including valid TLDs). The problem is that too many developers use the shortcut ([az] {2,4}) and I will call it good. If you are thinking along these lines, then why call it URL checking? Not this. It simply passes the URL through the regex.
I have an open source class that allows you to check the domain not only using a single source for TLD management (iana.org), but it will also check the domain through DNS records to make sure it really exists. DNS validation is optional, but the domain will definitely be valid based on the TLD.
For example: example.ay is NOT a valid domain because TLD.ay is not valid. But using the regular expression posted here ([az] {2,4}), it will pass. I have an affinity for quality. I am trying to express this in the code I am writing. Others may not bother. Therefore, if you just want to “verify” the URL, you can use the examples listed in these answers. If you really want to check the domain in the URL, you can have in the class I created to do just that. You can download it at: http://code.google.com/p/blogchuck/source/browse/trunk/domains.php
It checks based on RFCs that “manage” (using the term “freely”) to determine the valid domain. In short, here is what the domain class will do: Basic Domain Validation Rules
- must be at least one character long
- must begin with a letter or number
- contains letters, numbers and hyphens
- must end with a letter or number.
- may contain multiple nodes (i.e. node1.node2.node3)
- each node can be up to 63 characters long
- common domain name can be no more than 255 characters
- must end with a valid TLD
- may be an IP4 address
It will also download a copy of the main iana.org TLD file only after checking your local copy. If your local copy is 30 days old, it will download a new copy. The TLDs in the file will be used in REGEX to verify the TLDs in the domain you are checking. This prevents the validation of .ay (and other invalid TLDs).
This is a long bit of code, but very compact considering what it does. And this is the most accurate. That is why I asked the question before. Do you want to perform a “check” or “check”?
source share