Are email addresses allowed to contain non-alphanumeric characters?

I create a site using `Django. Significant users from non-English speaking countries may be present on the website.

I just want to know if there are any technical restrictions on the types of characters that an email address may contain.

Are email addresses allowed to contain English alphabets, numbers, "_", "@" and "."?

Are they allowed to contain non-English alphabets such as "Γ©" or "ΓΌ"?

Is it allowed to contain Chinese or Japanese or other Unicode characters?

+49
email unicode internationalization domain-name
Oct 02 2018-10-10T00:
source share
6 answers

The email address consists of two parts of local before @ and domain , which come after.

The rules for these parts are different:

For the local part you can use ASCII:

  • Latin letters A - Z a - z
  • digits 0 - 9
  • Special symbols! # $% & '* + - / =? ^ _ `{|} ~
  • dot. that it is not the first or last, and not in sequence
  • space and "(),:; <> @ [] restrictions are allowed (they are allowed only inside the quoted string, backslash or double quote must be followed by a backslash)
  • Plus, since 2012, you can use international characters above U+007F , encoded as UTF-8 .

Domain part more limited:

  • Latin letters A - Z a - z
  • digits 0 - 9
  • A hyphen is not the first or last; several hyphens are allowed in sequence.

Regex to check

^(([^<>()\[\]\.,;:\s@\"]+(\.[^<>()\[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})

Hope this saves you some time.

+32
May 19 '16 at 10:37
source share

Well yes. Read (at least) this Wikipedia article.

I live in Argentina and emails are allowed here, such as Γ±oΓ±Γ³1234@server.com

+36
Oct 02 2018-10-10T00:
source share

The allowed syntax in the email address is described in RFC 3696 and is quite active.

Exact rule [for the local part; the part before "@"] is that any ASCII character, including the control character, can appear in quotation marks or in a string with quotes. When quoting, a backslash character is used to indicate the following character
[...]
Without quotes, local parts can consist of any combination of alphabetic characters, numbers, or any special characters! # $% and '* + - / =? ^ _ `. {| } ~
[...]
Any characters or a combination of bits (in the form of octets) are allowed in DNS names. However, there is a preferred form that most applications require ...

... etc., to some extent.

+17
Oct 02 2018-10-10T00:
source share

Instead of worrying about which email addresses may or may not be contained, which you really are not interested in, check if your program can send them by email or not - this is what you are really interested in! This means that you are indeed sending a confirmation email.

Otherwise, you will not be able to catch the much more common case of random typos that remain within any character set that you design. (Quick: is random@mydomain.com - is the valid address for me on your site or not?) It also avoids unnecessary and free alienation of any users when you tell them that their address is absolutely correct and correct. You can still not handle some addresses (this is necessary to alienate), as other answers say: processing an email address is not trivial; but something they need to find out if they want to provide you an email address!

All you have to check is that the user puts the text before the @ symbol, the text after it, and the address is not outrageously long (say 1000 characters). If you want to provide a warning ("it looks like a problem! Is there a typo" double check before continuing "), this is fine, but it should not block the process of adding an email address.

Of course, if you do not want to send them an email, just take whatever they enter. For example, an address can only be used for Gravatar , but Gravatar still checks all email addresses.

+9
02 Oct '10 at 5:41
source share

It is possible to have non-ASCII email addresses as shown in this RFC: http://tools.ietf.org/html/rfc3490, but I think this has not been set for all countries, and from what I I understand that only one language code will be allowed for each country, and there is also a way to turn it into ASCII, but this will not be a trivial problem.

+5
Oct 02 2018-10-10T00:
source share

I came across single-quoted email addresses, and also infrequently. We reject spaces (although strictly speaking it is allowed), more than one "@" sign and an address line shorter than five characters. I believe that this solves more problems than it creates, and so far in ten years and several hundred thousand addresses he worked to reject many garbage addresses. There is also a trigger to delete all email addresses when pasting or updating.

It is not possible to verify the authenticity of the email without a round trip to the owner, but at least we can reject data that is extremely suspicious.

+2
Feb 26 '13 at 13:47
source share



All Articles