How to enable a subset of UNICODE code pages when checking input?

Question

How to enable a subset of UNICODE code pages when checking input?

I am creating a service that can "enter international markets" into markets that do not speak English. I don’t want to limit the username to the ASCII character range, but I would like the user to specify their “natural” username. OK, use UNICODE (and say UTF-8 as text encoding of my name).

But! I do not want users to create “unnamed” user names containing “code” characters. For example, I do not want to allow a username, for example, √√√√√øøøøøøø.

Is there a list of character code points for UNICODE that I can check (possibly with a regex) to accept / reject this username?

Thanks!

+3

validation unicode username

z8000 Oct 6 '09 at 15:41

source share

2 answers

In Python (per Introductory checking Unicode text in free form in Python ):

def only_letters(s):
    """
    Returns True if the input text consists of letters and ideographs only, False otherwise.
    """
    for c in s:
        cat = unicodedata.category(c)
        # Ll=lowercase, Lu=uppercase, Lo=ideographs
        if cat not in ('Ll','Lu','Lo'):
            return False
    return True

> only_letters('Bzdrężyło')
True
> only_letters('He7lo') # we don't allow digits here
False

0

kravietz Jun 15 '17 at 17:16

source share

Lukáš Lalinský · Accepted Answer · 2009-10-06T15:51:28+0000

Unicode has several categories , so you can easily exclude characters. How exactly does this depend on the language you use. Some regex frameworks have a built-in function, and some do not.

How to enable a subset of UNICODE code pages when checking input?

More articles: