Top Level Domain Number?

Can top-level domains contain a number at the end? I don’t know anything about DNS rules, etc., but when I try to use the PHP filter_var () function with FILTER_VALIDATE_EMAIL for test@null.com1 it returns true.

+8
source share
3 answers

Conceptually, there is nothing that prohibits numbers in TLDs and in the future, who knows, there may be numeric TLDs.

There are currently no TLDs that have numbers in them - the function probably does not check the list of known TLDs (as this can be changed), but lexically.

+10
source

In fact, there are currently several TLDs that contain numbers:

 XN--1QQW23A XN--3BST00M XN--3DS443G XN--3E0B707E XN--45BRJ9C XN--4GBRIM XN--55QW42G XN--55QX5D XN--6FRZ82G XN--6QQ986B3XL XN--80ADXHKS XN--80AO21A XN--80ASEHDB XN--80ASWG XN--90A3AC XN--C1AVG XN--CG4BKI XN--CLCHC0EA0B2G2A9GCD XN--CZR694B XN--CZRU2D XN--D1ACJ3B XN--FIQ228C5HS XN--FIQ64B XN--FIQS8S XN--FIQZ9S XN--FPCRJ9C3D XN--FZC2C9E2C XN--GECRJ9C XN--H2BRJ9C XN--I1B6B1A6A2E XN--IO0A7I XN--J1AMH XN--J6W193G XN--KPRW13D XN--KPRY57D XN--KPUT3I XN--L1ACC XN--LGBBAT1AD8J XN--MGB9AWBF XN--MGBA3A4F16A XN--MGBAAM7A8H XN--MGBAB2BD XN--MGBAYH7GPA XN--MGBBH1A71E XN--MGBC0A9AZCG XN--MGBERP4A5D4AR XN--MGBX4CD0AB XN--NGBC5AZD XN--NQV7F XN--NQV7FS00EMA XN--O3CW4H XN--OGBPF8FL XN--P1AI XN--PGBS0DH XN--Q9JYB4C XN--RHQV96G XN--S9BRJ9C XN--SES554G XN--UNUP4Y XN--VHQUV XN--WGBH1C XN--WGBL6A XN--XHQ521B XN--XKC2AL3HYE2A XN--XKC2DL3A5EE0H XN--YFRO4I67O XN--YGBI2AMMX XN--ZFR164B 

You can see the updated list here data.iana.org/TLD/tlds-alpha-by-domain.txt or the list with descriptions here swcs.com.au/tld.htm

+9
source

Can a top-level domain contain a number at the end?

Yes, technically, unless it is purely digital in nature, it cannot be a TLD in accordance with current rules and for obvious reasons (to eliminate ambiguity using IP addresses). And it cannot contain a number at the end, except when it comes to IDN TLDs, for reasons imposed by ICANN.

Let's go back to some RFCs to get clearer definitions of things:

RFC 952: DOD INTERNET BUSINESS TABLE SPECIFICATIONS (October 1985)

This is the definition of the Internet host name then:

The "name" (Net, Host, Gateway or Domain name) is a text line up
up to 24 characters taken from the alphabet (AZ), numbers (0-9), minus
sign (-) and period (.). Please note that periods are only allowed when
they serve to demarcate the components of "domain style names". (See
RFC-921, “Domain Name System Implementation Schedule,” for
background). No spaces or spaces are allowed as part of the name. No distinction is made between uppercase and lowercase letters. The first character must be an alpha character. The last character must not be a minus sign or a period.

Note that it also has:

Single-character names or aliases are not allowed.

Therefore, at this point:

  • com1 is a valid TLD
  • 3com no ("The first character must be an alphabetic character.")
  • 42 no (for the same reason)
  • 1 no (for the same reason)
  • a is not ("Unambiguous names or aliases are not allowed.")

RFC 1034: DOMAIN NAMES - CONCEPTS AND OBJECTS (November 1987)

This is one of the RFCs that created the DNS, as we know today. For compatibility reasons, he defined host names as a sequence of labels, where the label is defined as follows:

They must begin with a letter, end with a letter or number and contain only letters, numbers and hyphens as internal characters. There are also some length restrictions. Tags must be 63 characters or less.

TLD is one of the shortcuts among others. According to the rule above, com1 is a valid label and therefore a TLD where 3com would not be. Which directly brings us to the next amendment.

RFC 1123: Internet Host Requirements - Application and Support (October 1989)

This corrects the previous RFC by changing one rule:

The syntax for a valid hostname on the Internet was specified in RFC-952 [DNS: 4]. One aspect of the host name syntax is hereby modified: the restriction on the first character is relaxed to allow the use of a letter or number. Host software MUST support this more liberal syntax.

So, at the moment:

  • com1 is a valid TLD
  • 3com also operates
  • 42 valid
  • 1 valid
  • a is valid

For numerical TLDs, the following rule applies in the first document:

Whenever a user enters a host ID on the Internet, SHOULD either enter (1) the host domain name or (2) the decimal dot IP address ("#. #. #. #"). The host MUST check the string syntactically for a decimal number with periods before searching for it in the domain name system.

and

If a decimal number with periods can be entered without such separator identifiers, then a full syntax check is necessary, since the host domain name segment can now begin with a digit and can legally be completely numeric (see Section 6.1. 2.4). However, a valid hostname can never be decimal with dots #. #. #. # Since at least the label of the highest level component will be alphabetical.

RFC 1738: Unified Resource Locators (URLs) (December 1994)

It also speaks of TLD, but gives

The fully qualified domain name of a network node or its IP address as a set of four groups of decimal digits separated by a "." Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by “.”, Each domain label starts and ends with an alphanumeric character and possibly also contains characters "-". However, the rightmost domain label never begins with a digit that syntactically distinguishes all domain names from IP addresses.

RFC 3696: Application Methods for Validating and Converting Names (February 2004)

This was necessary for the introduction of IDNs (Internationalized Domain Names), and it has the following:

Any characters or bit combinations (in octets) are permitted in DNS names. However, there is a preferred form that is required by most applications. This preferred form was the only one allowed in top-level domain names or TLDs. In general, this is also the only form permitted for most second-level names registered in TLDs, although some names that are not usually visible to users are subject to different rules. It derives from the original ARPANET rules for naming hosts (that is, the "host name" rule) and is perhaps best described as an "LDH rule" after the characters that it allows. The updated LDH rule provides that labels (words or lines separated by periods) that make up a domain name must consist only of alphanumeric characters ASCII [ASCII], as well as a hyphen. Other characters or punctuation are not allowed, as well as spaces. If a hyphen is used, it cannot appear at the beginning or at the end of a label. There is an additional rule, which essentially requires that top-level domain names not be all-numeric.

In fact, as soon as the IDNs are involved and they are the IDN TLDs (now both ccTLDs and gTLDs), the selected encoding generates an ASCII string in the form xn--something where something can have numbers, including at the end, as shown below. in other answers.

However, it is not clear where the "additional rule" comes from in the last sentence.

RFC 4697: Observed Incorrect DNS Resolution (October 2006)

Not defining anything, but providing some interesting facts:

The root name servers receive a significant number of A record requests, where the QNAME looks like an IPv4 address.

and

A possible solution is to delegate these numeric TLDs from the root zone to a separate set of servers to absorb traffic.

This clearly shows that indeed, in the wild, there are applications, perhaps by mistake, but it shows, at least, that it works technically by sending queries for names that are really formatted as IPv4 addresses, that is, with a fully numerical " TLD. "

In fact, there was experience running the .42 registry, obviously completely outside the ICANN ecosystem. You can see a summary of it at http://www.dotsauce.com/experimental-numeric-tld-42-domain/ and an archive of their main explanations at https://web.archive.org/web/20101222151118/http:/ /register.42registry.org:80/ (in French).

It has not gone far, even if it technically works.

For example, he showed that the Microsoft-based operating system does not by default take into account purely numerical TLDs, but a patch is provided for this: https://support.microsoft.com/en-us/help/947228/error-message-when - you-try-to-join-a-windows-vista-based-client-comput "When you try to join a Windows Vista -based client computer to a top-level domain (TLD) with a purely numeric suffix, a Windows Vista-based client computer cannot join to the domain. [..] This behavior is provided. "

Internet-Draft draft-liman-tld-names-06: Top Level Domain Name Specification (November 2011)

Finally, this provides some explanations as to why purely numerical TLDs or even single-digit TLDs are sometimes considered invalid if this is not an explicit consequence of the above specifications:

(Section 2.1 below refers to the content in RFC 1123 above)

In addition, the DISCUSSION section of section 2.1 states:

  'However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic.' [Section 2.1] 

Some developers may have realized that the phrase “will be alphabetic” above is a limitation of the protocol.

But basically, I just recommend going with the flow and continuing the same restrictions:

Neither [RFC0952] nor [RFC1123] explicitly indicates the reasons for these restrictions. It can be assumed that human factors were a consideration; [RFC1123] seems to suggest that one of the reasons was to prevent confusion between IPv4 dotted decimal addresses and host domain names. In any case, it is reasonable to assume that restrictions have been adopted in some deployed programs and that changes to the rules should be made with caution.

Therefore, he proposed this definition:

traditional tld-label = 1 * 63 (ALPHA)

This project was never converted to RFC because not everyone agreed with it. You can find the thread with votes against this at https://www.ietf.org/mail-archive/web/dnsop/current/msg08866.html ; it was mostly unclear whether there was a limitation in the past, that we are now trying to relax a bit, or there was no limitation from the very beginning, and that people have implemented systems incorrectly.

For example, you can see about this Chromium / Chrome error report: https://bugs.chromium.org/p/chromium/issues/detail?id=31405 Viewing failed when using TLDs starting with a digit or purely numeric (this worked if it ended with a number with letters earlier). This was not considered a bug and not fixed, because the browser comes with a TLD list, so it can know which ones are valid and which are not, in addition to checking their syntax.

ICANN Implementation Guide for New TLDs (June 2012)

Available at https://newgtlds.icann.org/en/applicants/agb/guidebook-full-04jun12-en.pdf, starting on page 64:

The ASCII tag (i.e., the tag transmitted over the wire) must be valid as specified in the technical standards "Domain Names: Implementation and Specification" (RFC 1035) and explanations of the DNS Specification (RFC 2181) and any updates to it.

The ASCII label must be a valid host name as specified in Technical Standards DOD Internet Host Table Specification (RFC 952), Internet Host Requirements - Application and Support (RFC 1123), and Application Methods for Verifying and Converting Names (RFCs). 3696), Internationalized Application Domain Names (IDNAs) (RFC 5890-5894) and any updates to them. This includes the following:

The ASCII label must consist entirely of letters (letters of the alphabet az) or

The label must be a valid IDNA label A (further limited as described in Part II below).

Pay special attention: the ASCII label must consist exclusively of letters (letters of the alphabet az)

This immediately prohibits any full digit, as well as virtually any digit, including at the end, with the exception of the IDN TLD, one that has the form xn--something .

Please note that someone directly asked ICANN about this and received the following response, shown at https://domaingang.com/domain-news/icann-applicant-handbook-this-is-why-we-cannot-have- numeric-gtlds / :

Please note that numeric TLDs were banned in the first round of applications. The ban on numeric gTLDs in the Applicant Guidebook ( http://newgtlds.icann.org/en/applicants/agb ) is due to a number of technical issues regarding the ability of such domains to function properly. Domain names are often used where other kinds of identifiers, such as IP addresses, can be used.

The fact that a TLD is literal is often a key factor in determining software to identify a domain name. If TLDs such as ".123" are enabled, you could have a domain name of "74.125.244.123" that would be difficult to distinguish from an IP address of "74.125.244.123". There are other considerations: the technical standards documentation states that TLDs will be alphabetic, which was also codified as an assumption in the software.

The AGB character limit was designed to limit these scenarios, which means that such TLDs are unlikely to work well in software, and will also limit potential security issues that might arise from the same issues.

+1
source

Source: https://habr.com/ru/post/907266/


All Articles