Unicode range 0600-06FF. Unicode .. . , code range 0750-077F , . 08A0-08FF ,
, .. , FB50-FDFF FE70-FEFF, , , 0600-06FF.
( ) Unicode ( ). 4E00-9FD5. , , script, , Unicode Consortium .
So, if you need to filter only Arabic and Chinese scripts and don’t want to use the approach suggested by troelskn (i.e., using lists of common words for languages that you want to identify) does not scale too well for a large number of languages), it is enough to determine the range of code characters at your input. StackOverflow has already resolved an earlier question about methods for determining Unicode ranges in PHP .
source
share