Unicode name validation

Question

Unicode name validation

In ASCII, validating a name is not too complicated: just make sure all characters are in alphabetical order.

But what about in Unicode (utf-8)? How can I make sure that there are no commas or underscores in the given string (outside the ASCII area)?

(perfect in Python)

+3

python validation unicode character-properties

Gilbert Mar 09 '09 at 15:30

source share

5 answers

, unicodedata module. category(). unicode.org. ..

+5

unbeknown 09 . '09 15:39

, "", :

^\w+$

. , :

[\d_]

. :

^(?:(?![\d_])\w)+$

.

:

\w
LOCALE UNICODE , - ; [a-zA-Z0-9_]. LOCALE [0-9_] - . UNICODEset, [0-9_] , - .

+1

Tomalak 09 . '09 15:35

:

import unicodedata
EXCEPTIONS= frozenset(u"'.")
CATEGORIES= frozenset( ('Lu', 'Ll', 'Lt', 'Pd', 'Zs') )
# O'Rourke, Franklin D. Roosevelt

def test_unicode_name(unicode_name):
    return all(
      uchar in EXCEPTIONS
        or unicodedata.category(uchar) in CATEGORIES
      for uchar in unicode_name)

>>> test_unicode_name(u"Michael O'Rourke")
True
>>> test_unicode_name(u"Χρήστος Γεωργίου")
True
>>> test_unicode_name(u"Jean-Luc Géraud")
True

, , , .

+1

tzot 09 . '09 20:21

letters string , . , , , setlocale() .

http://docs.python.org/library/string.html#module-string

, , "" , , "", . , ASCII .

0

Jarret Hardie 09 . '09 15:35

zgoda · Accepted Answer · 2009-03-09T15:46:43+0000

Just convert bytestring (your utf-8) to unicode objects and check if all characters are alphabetic:

s.isalpha()

This method is language dependent for bytes.

Unicode name validation

More articles: