UTF-8 & IsAlpha () in PHP

I am working on an application that supports several languages โ€‹โ€‹and has functionality that tries to use the language requested by the browser, and also allows manual overriding of this function. This part works great and selects the correct templates, labels, etc.

The user sometimes needs to enter text on his own and that I run into problems, because the application must accept even "complex" languages, such as Chinese and Russian. So far, I have taken care of the things mentioned in other posts, i.e.:

  • call mb_internal_encoding( 'UTF-8' )
  • setting the correct encoding when rendering web pages using meta http-equiv=Content-Type content=text/html;charset=UTF-8(format adapted due to stack flow restrictions)
  • even the content is doing right because mb_detect_encoding() == UTF-8
  • tried to install setLocale(LC_CTYPE, "UTF-8"), which doesnโ€™t seem to work, because it requires a choice of one language, which I cannot specify, because I need to support several. And it still fails if I force it manually check, i.e. WITH; setLocale(LC_CTYPE,"zh__CN.utf8")- ctype_alpha()will still fail for the Chinese text

It seems that even an explicit choice of language does not make it ctype_alpha()useful.

Therefore, the question arises: how should I check alphabetical characters in all languages?

The only idea I had at the moment was to manually check with arrays of "valid" characters, but that seems ugly, especially for the Chinese.

Please let me know how you solve this problem.

Many thanks!

+2
4

Unicode , ( pcre-regex Unicode):

// adjust pattern to your needs
// $input needs to be UTF-8 encoded
if (preg_match('/^\p{L}+$/u', $input)) {
    // OK
} else {
    // not OK
}

\p{L} unicode L (etter), Ll ( ), Lm (-), Lo ( ), Lt ( ) Lu ( ) - : ).

+6

. .

IP- . - . , , , . (, , . , : " , , / , - (An A ).

0

$_SERVER['HTTP_ACCEPT_LANGUAGE']

-

de-de,de;q=0.8,en-us;q=0.5,en;q=0.3

. setLocale.

0

, . UTF-8 .

The best approach is to use UTF-8 throughout your project: in your database, in your output, and, as expected, encoding for input.

  • Conclusion Make sure that you encode your data using UTF-8 and declare that the HTTP header is in the field Content-Type, not just the document itself.
  • Input If you use forms, declare the expected encoding in the attribute accept-charset.
0
source

Source: https://habr.com/ru/post/1726184/


All Articles