Unicode name validation

In ASCII, validating a name is not too complicated: just make sure all characters are in alphabetical order.

But what about in Unicode (utf-8)? How can I make sure that there are no commas or underscores in the given string (outside the ASCII area)?

(perfect in Python)

+3
source share
5 answers

Just convert bytestring (your utf-8) to unicode objects and check if all characters are alphabetic:

s.isalpha()

This method is language dependent for bytes.

+5
source

, unicodedata module. category(). unicode.org. ..

+5

, "", :

^\w+$

. , :

[\d_]

. :

^(?:(?![\d_])\w)+$

.

:

\w

LOCALE UNICODE , - ; [a-zA-Z0-9_]. LOCALE [0-9_] - . UNICODEset, [0-9_] , - .

+1

:

import unicodedata
EXCEPTIONS= frozenset(u"'.")
CATEGORIES= frozenset( ('Lu', 'Ll', 'Lt', 'Pd', 'Zs') )
# O'Rourke, Franklin D. Roosevelt

def test_unicode_name(unicode_name):
    return all(
      uchar in EXCEPTIONS
        or unicodedata.category(uchar) in CATEGORIES
      for uchar in unicode_name)

>>> test_unicode_name(u"Michael O'Rourke")
True
>>> test_unicode_name(u"Χρήστος Γεωργίου")
True
>>> test_unicode_name(u"Jean-Luc Géraud")
True

, , , .

+1

letters string , . , , , setlocale() .

http://docs.python.org/library/string.html#module-string

, , "" , , "", . , ASCII .

0

Source: https://habr.com/ru/post/1704446/


All Articles