Should Unicode be allowed in usernames?

Why do most (all?) Sites only support usernames in ASCII? Are there any security concerns if the administrator decides to start accepting Unicode usernames?

+44
security web-services unicode username
Aug 12 '10 at 18:10
source share
8 answers

Homoglyphs. The user "cat" and "cat" are different strings in unicode, although they look the same. The first letter in the second "set" is the Russian "s" - "CYRILLIC SMALL LETTER ES", more precisely. The system cannot easily say that you are replacing a different username - different nicknames on the computer.

Edit: Preventing mixed scripts does not solve the problem. For example, โ€œsosoโ€ is pure cyrillic and can be used to replace ascii 'coco'.

In addition, from left to right override (and friends). Leave them unanalyzed and they will ruin your entire page.

+54
Aug 12 '10 at 18:24
source share

HTTP authentication? There may be problems sending Unicode username (and / or password) over existing protocols. One case that I have encountered before is basic authentication. There is no clear way to handle sending these unicode names / passwords in the main header headers.

+6
Aug 12 2018-10-18
source share

While it is completely doubtful why there should be a username and not just a password to identify the user, I think there is no reason to refuse Unicode usernames.

More importantly, the password must be verified as lanuguage-agnostic: it must handle keys, regardless of the user's keyboard settings. This means that ืฉืœื•ื and akuo will be the same. This is important because the user often does not see the password characters that they type, and they become very evil if CAPSLOCK is enabled.

+5
Aug 23 2018-10-10T00:
source share

As long as you can go ahead and enable unicode, be aware that some usernames will not work properly due to different cultures applying different rules to the same characters.

Consider the main case for detecting case sensitivity: in Turkish, the user names "Id1" and "id1" are different (in Turkish there are two different Is, one with a dot and one without, the result is 2 files with an inscription and 2 small letters, which do not comply with the same rules for printing as in English). Therefore, although any Turkish person can enter their name in their own language, the program will not treat their name as they expect - instead, it will undergo a strange transformation into mutant English.

Special Latin characters in European languages โ€‹โ€‹have similar matches, which makes them seem random with respect to which language they are entered in. Other regions of the world have similar common characters, where the rules of use are different - in some cases, national and cultural hatred can lead to some very angry people when the characters making up their username are treated as if they were written in the language of their hated enemy (due to the fact that these were the default settings for these foreign characters).

+4
Aug 12 '10 at 18:52
source share

Your observation is not always true. And the choice of ASCII is more human factors, rather than technical or security issues.

In most cases, this is just for ease of programming. The programmer never knows that all the software, libraries, utilities on the website will be broken or not with some characters. Why is website development risked while ASCII works well? In addition, some packaged web programs will prevent Unicode from being used in the username. This makes many websites only support usernames in ASCII.

Theoretically, all current software can handle 8-bit data well. There are currently no issues with storage or transfer. Even if some protocols do not exist, they can translate to UTF-7 or with other conversion schemes.

Unicode has some problems. It is more on the data processing side. This can be display, fonts, readiness of program and program libraries for characters other than BMP, sorting, comparison, input methods, recording directions. Administrators may not be knowledgeable enough to handle them. Depending on the nature of the website, this may be a problem, but basically it is not.

For administrator purposes, itโ€™s not easy to type some exotic characters. This makes it difficult for the administrator to search for users. It is also difficult for an administrator to store abusive usernames in foreign languages โ€‹โ€‹from a website.

However, it is not uncommon that Chinese usernames are used on a Chinese site. This may not always be in ASCII. Other cultures and languages โ€‹โ€‹do the same. In some global projects, almost all kinds of Unicode characters. Wikipedia is an example.

+3
Aug 13 '10 at 10:28
source share

Normal ASCII is rare, I would say. Often itโ€™s just that no one thinks about it, since in Western Europe, Latin 1 is enough for the United States. Some databases distinguish between text in legacy character sets and Unicode ( varchar vs. nvarchar ), or a special character set must be installed for other databases.

Especially in the US, many people do not even notice that ASCII will not be enough. Some are trying to find excuses with "Users must enter it" or similar, which are mostly fictitious. "

In order to answer your question, I doubt that there are security considerations, except, perhaps, for changing the names of other people using different scripts (and the same, but one is Latin, one is Cyrillic - this was done with URLs earlier ) As a rule, I see this as oversight by developers, who probably should know better.

+2
Aug 12 '10 at 18:18
source share

I would say that the big reason is the lack of support for unicode in most PHP installations. It is not easy to work with it, so why is this possible if the capabilities in ASCII are sufficient to cover your entire user base?

-2
Aug 12 '10 at 18:16
source share

Or, we could just stop bragging about what the user name looks like and whether we can pronounce / remember it. This should be user concern. If no one remembers you, this is your loss. And as for spoofing names, this is almost inevitable anyway. And yet, rarely have you ever heard of username errors.

Imagine a forum, imagine someone sending a message with an account that WATCH is identical to yours. You get into trouble, say that you did not do this, post a link to your story, see. The message does not exist. Click on the profile of the guy who ACTUALLY posted it on, and bam, you have a profile. He is obscene now.

Having the same name does not mean that you have the same user data. Any application that does not allow you to distinguish between two similar users is in any case unsatisfactory and needs to be rewritten.

-2
Aug 13 2018-10-12T00:
source share



All Articles