What are Unicode ranges for Hindi-emphasized characters?

I am trying to compile a Unicode list of all the “o” shapes in a Hindi character set. In fact, a list of any characters (in any language) that makes use of individual characters to indicate emphasis would be better.

I intend to use this unicode list in RegExp.

I tried to change the list of character ranges by displaying them in an Input TextField, but editing this text causes strange problems (the keyboard cursor does not fit on the correct character, the selection suddenly disappears / is skewed incorrectly ... in other words ... HINDI HELL!)

I also tried this with Notepad ++, but although it was more responsive, it eventually tricked me like in the Flash Player text box. This happens, especially when deleting the characters [] block (nulls?). Some of them cause odd behavior.

Anyway, all I want is a list of accents. An example of several is shown in the image below (but I need ALL accents):

enter image description here

Thanks!

+4
source share
3 answers

You can find a pdf containing Unicode range lists, grouped by language, here: http://unicode.org/charts/

For Hindi, you probably want Devanagari or Devanagari to be expanded .

+5
source

Here is the character class for combining Devanagari labels:

[\u901\u902\u903\u93c\u93e\u93f\u940\u941\u942\u943 \u944\u945\u946\u947\u948\u949\u94a\u94b\u94c\u94d \u951\u952\u953\u954\u962\u963] 

This is only the Devanagari base unit (not the extended Devanagari).

+3
source

If you want a complete set (for all languages), you can do this problematically. You start with the Unicode date file at ftp://ftp.unicode.org/Public/6.1.0/ucd/UnicodeData.txt described by TR-44 ( http://unicode.org/reports/tr44/#Property_Definitions )

You can use the Canonical_Combining_Class field (see http://unicode.org/reports/tr44/#Canonical_Combining_Class_Values ) to filter the exact characters you want. It could not be more accurate, because the "accent" is a bit vague :-) You may also need to look at General_Category to get the filter to the right (and exclude certain labels, characters or punctuation).

And a script will definitely do this better than trying to get confused with text editors. One of the characteristics of a combination of characters is that they combine :-) Thus, you can get all kinds of cryptic results (for example: http://www.siao2.com/2006/02/17/533929.aspx :-)

0
source

Source: https://habr.com/ru/post/1399260/


All Articles