Unicode Arabic character processing

When Unicode knows when to read from right to left or left to right.

Both in word and in python.

For instance,

ู‡ุฐุง ู‡ูˆ ุงู„ู…ู„ุนูˆู† ุฌูŠุฏ ุฑุฌู„ ุงู„ุตุจุงุญ! 

If you manage to go back, it will be read from right to left.

I printed a Unicode byte representation which

 u'\u0647\u0630\u0627 \u0647\u0648 \u0627\u0644\u0645\u0644\u0639\u0648\u0646 \u062c\u064a\u062f \u0631\u062c\u0644 \u0627\u0644\u0635\u0628\u0627\u062d!' 

But I did not see anything that meant left to right or left.

For regular strings like

 Hi how are you, it works from left to right. 

Should there be a unicode character or byte to indicate left to right or something else?

+5
source share
2 answers

There is, here's a comedic way to explain it. https://www.explainxkcd.com/wiki/index.php/1137:_RTL

In Unicode, the RLM character is encoded in U + 200F RIGHT-LEFT MARK (HTML ‏ ยท ‏). In UTF-8 it is E2 80 8F. Assignment in a bidirectional Unicode algorithm. LRM coded U + 200E LEFT

https://en.wikipedia.org/wiki/Right-to-left_mark

A bidirectional algorithm is described here. http://unicode.org/reports/tr9/

Specifically ALM U + 061C ARABIC LETTER MARK Arabic numeral sign with zero width from right to left

+1
source

The recording direction is a property of each Unicode character. Unicode contains a complex set of properties for each code point (for example, a number, a mathematical symbol, be it an alphabetical one, its case, orientation, which blocks its code, which indirectly defines a script - etc.).

For example, see http://www.fileformat.info/info/unicode/char/0647/index.htm (this is the first character in your example), which includes the bidi property (bidirectional) [AL] - this encodes "from right to left Arabic "as the recording direction for this character.

There are Unicode characters that explicitly specify the direction of the text, but this is usually not necessary or useful. The font renderer should already know, for each character that it displays, from its Unicode properties, in which direction it is required (although text converted from other encodings may contain explicit direction indicator codes).

+2
source

Source: https://habr.com/ru/post/1246322/


All Articles